Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Interpolated Strontium Values dataset Ver. 3.1 presents the interpolated data of strontium isotopes for the southern Trans-Urals, based on the data gathered in 2020-2022. The current dataset consists of five sets of files for five various interpolations: based on grass, mollusks, soil, and water samples, as well as the average of three (excluding the mollusk dataset). Each of the five sets consists of a CSV file and a KML file where the interpolated values are presented to use with a GIS software (ordinary kriging, 5000 m x 5000 m grid). In addition, two GeoTIFF files are provided for each set for a visual reference.
Average 5000 m interpolated points.kml / csv: these files contain averaged values of all three sample types.
Grass 5000 m interpolated points.kml / csv: these files contain data interpolated from the grass sample dataset.
Mollusks 5000 m interpolated points.kml / csv: these files contain data interpolated from the mollusk sample dataset.
Soil 5000 m interpolated points.kml / csv: these files contain data interpolated from the soil sample dataset.
Water 5000 m interpolated points.kml / csv: these files contain data interpolated from the water sample dataset.
The current version is also supplemented with GeoTiff raster files where the same interpolated values are color-coded. These files can be added to Google Earth or any GIS software together with KML files for better interpretation and comparison.
Averaged 5000 m interpolation raster.tif: this file contains a raster representing the averaged values of all three sample types.
Grass 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the grass sample dataset.
Mollusks 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the mollusk sample dataset.
Soil 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the soil sample dataset.
Water 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the water sample dataset
In addition, the cross-validation rasters created during the interpolation process are also provided. They can be used as a visual reference of the interpolation reliability. The grey areas on the raster represent the areas where expected values do not differ from interpolated values for more than 0.001. The red areas represent the areas where the error exceeded 0.001 and, thus, the interpolation is not reliable.
How to use it?
The data provided can be used to access interpolated background values of bioavailable strontium in the area of interest. Note that a single value is not a good enough predictor and should never be used as a proxy. Always calculate a mean of 4-6 (or more) nearby values to achieve the best guess possible. Never calculate averages from a single dataset, always rely on cross-validation by comparing data from all five datasets. Check the cross-validation rasters to make sure that the interpolation is reliable for the area of interest.
References
The interpolated datasets are based upon the actual measured values published as follows:
Epimakhov, Andrey; Kisileva, Daria; Chechushkov, Igor; Ankushev, Maksim; Ankusheva, Polina (2022): Strontium isotope ratios (87Sr/86Sr) analysis from various sources the southern Trans-Urals. PANGAEA, https://doi.pangaea.de/10.1594/PANGAEA.950380
Description of the original dataset of measured strontium isotopic values
The present dataset contains measurements of bioavailable strontium isotopes (87Sr/86Sr) gathered in the southern Trans-Urals. There are four sample types, such as wormwood (n = 103), leached soil (n = 103), water (n = 101), and freshwater mollusks (n = 80), collected to measure bioavailable strontium isotopes. The analysis of Sr isotopic composition was carried out in the cleanrooms (6 and 7 ISO classes) of the Geoanalitik shared research facilities of the Institute of Geology and Geochemistry, the Ural Branch of the Russian Academy of Sciences (Ekaterinburg). Mollusk shell samples preliminarily cleaned with acetic acid, as well as vegetation samples rinsed with deionized water and ashed, were dissolved by open digestion in concentrated HNO 3 with the addition of H 2 O 2 on a hotplate at 150°C. Water samples were acidified with concentrated nitric acid and filtered. To obtain aqueous leachates, pre-ground soil samples weighing 1 g were taken into polypropylene containers, 10 ml of ultrapure water was added and shaken in for 1 hour, after which they were filtered through membrane cellulose acetate filters with a pore diameter of 0.2 μm. In all samples, the strontium content was determined by ICP-MS (NexION 300S). Then the sample volume corresponding to the Sr content of 600 ng was evaporated on a hotplate at 120°C, and the precipitate was dissolved in 7M HNO 3. Sample solutions were centrifuged at 6000 rpm, and strontium was chromatographically isolated using SR resin (Triskem). The strontium isotopic composition was measured on a Neptune Plus multicollector mass spectrometer with inductively coupled plasma (MC-ICP-MS). To correct mass bias, a combination of bracketing and internal normalization according to the exponential law 88 Sr/ 86 Sr = 8.375209 was used. The results were additionally bracketed using the NIST SRM 987 strontium carbonate reference material using an average deviation from the reference value of 0.710245 for every two samples bracketed between NIST SRM 987 measurements. The long-term reproducibility of the strontium isotopic analysis was evaluated using repeated measurements of NIST SRM 987 during 2020-2022 and yielded 87 Sr/ 86 Sr = 0.71025, 2SD = 0.00012 (104 measurements in two replicates). The within-laboratory standard uncertainty (2σ) obtained for SRM-987 was ± 0.003 %.
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
EUCA dataset description Associated Paper: EUCA: the End-User-Centered Explainable AI Framework
Authors: Weina Jin, Jianyu Fan, Diane Gromala, Philippe Pasquier, Ghassan Hamarneh
Introduction: EUCA dataset is for modelling personalized or interactive explainable AI. It contains 309 data points of 32 end-users' preferences on 12 forms of explanation (including feature-, example-, and rule-based explanations). The data were collected from a user study on 32 layperson participants in the Greater Vancouver city area in 2019-2020. In the user study, the participants (P01-P32) were presented with AI-assisted critical tasks on house price prediction, health status prediction, purchasing a self-driving car, and studying for a biological exam [1]. Within each task and for its given explanation goal [2], the participants selected and rank the explanatory forms [3] that they saw the most suitable.
1 EUCA_EndUserXAI_ExplanatoryFormRanking.csv
Column description:
Index - Participants' number Case - task-explanation goal combination accept to use AI? trust it? - Participants response to whether they will use AI given the task and explanation goal require explanation? - Participants response to the question whether they request an explanation for the AI 1st, 2nd, 3rd, ... - Explanatory form card selection and ranking cards fulfill requirement? - After the card selection, participants were asked whether the selected card combination fulfill their explainability requirement.
2 EUCA_EndUserXAI_demography.csv
It contains the participants demographics, including their age, gender, educational background, and their knowledge and attitudes toward AI.
EUCA dataset zip file for download
More Context for EUCA Dataset [1] Critical tasks There are four tasks. Task label and their corresponding task titles are: house - Selling your house car - Buying an autonomous driving vehicle health - Personal health decision bird - Learning bird species
Please refer to EUCA quantatative data analysis report for the storyboard of the tasks and explanation goals presented in the user study.
[2] Explanation goal End-users may have different goals/purposes to check an explanation from AI. The EUCA dataset includes the following 11 explanation goals, with its [label] in the dataset, full name and description
[trust] Calibrate trust: trust is a key to establish human-AI decision-making partnership. Since users can easily distrust or overtrust AI, it is important to calibrate the trust to reflect the capabilities of AI systems.
[safe] Ensure safety: users need to ensure safety of the decision consequences.
[bias] - Detect bias: users need to ensure the decision is impartial and unbiased.
[unexpect] Resolve disagreement with AI: the AI prediction is unexpected and there are disagreements between users and AI.
[expected] - Expected: the AI's prediction is expected and aligns with users' expectations.
[differentiate] Differentiate similar instances: due to the consequences of wrong decisions, users sometimes need to discern similar instances or outcomes. For example, a doctor differentiates whether the diagnosis is a benign or malignant tumor.
[learning] Learn: users need to gain knowledge, improve their problem-solving skills, and discover new knowledge
[control] Improve: users seek causal factors to control and improve the predicted outcome.
[communicate] Communicate with stakeholders: many critical decision-making processes involve multiple stakeholders, and users need to discuss the decision with them.
[report] Generate reports: users need to utilize the explanations to perform particular tasks such as report production. For example, a radiologist generates a medical report on a patient's X-ray image.
[multi] Trade-off multiple objectives: AI may be optimized on an incomplete objective while the users seek to fulfill multiple objectives in real-world applications. For example, a doctor needs to ensure a treatment plan is effective as well as has acceptable patient adherence. Ethical and legal requirements may also be included as objectives.
[3] Explanatory form The following 12 explanatory forms are end-user-friendly, i.e.: no technical knowledge is required for the end-user to interpret the explanation.
Feature-Based Explanation
Feature Attribution - fa
Note: for tasks that has image as input data, the feature attribution is denoted by the following two cards:
ir: important regions (a.k.a. heat map or saliency map)
irc: important regions with their feature contribution percentage
Feature Shape - fs
Feature Interaction - fi
Example-Based Explanation
Similar Example - se Typical Example - te
Counterfactual Example - ce
Note: for contractual example, there were two visual variations used in the user study: cet: counterfactual example with transition from one example to the counterfactual one ceh: counterfactual example with the contrastive feature highlighted
Rule-Based Explanation
Rule - rt Decision Tree - dt
Decision Flow - df
Supplementary Information
Input Output Performance Dataset - prior (output prediction with prior distribution of each class in the training set)
Note: occasionally there is a wild card, which means the participant draw the card by themselves. It is indicated as 'wc'.
For visual examples of each explanatory form card, please refer to the Explanatory_form_labels.pdf document.
Link to the details on users' requirements on different explanatory forms
Code and report for EUCA data quantatitve analysis
EUCA data analysis code EUCA quantatative data analysis report
EUCA data citation @article{jin2021euca, title={EUCA: the End-User-Centered Explainable AI Framework}, author={Weina Jin and Jianyu Fan and Diane Gromala and Philippe Pasquier and Ghassan Hamarneh}, year={2021}, eprint={2102.02437}, archivePrefix={arXiv}, primaryClass={cs.HC} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cosmos-Transfer1-7B-Sample-AV-Data-Example
Cosmos | Code | Paper | Paper Website
Dataset Description:
This dataset contains 10 sample data points intended to help users better utilize our Cosmos-Transfer1-7B-Sample-AV model. It includes HD Map annotations and LiDAR data, with no personally identifiable information such as faces or license plates. This dataset is intended for research and development only.
Dataset Owner(s):
NVIDIA
Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Cosmos-Transfer1-7B-Sample-AV-Data-Example.
https://brightdata.com/licensehttps://brightdata.com/license
Use our Zalando DE & UK products dataset to get a complete snapshot of new products, categories, pricing, and consumer reviews. Depending on your needs, you may purchase the entire dataset or a customized subset. Popular use cases: Identify product inventory gaps and increased demand for certain products, analyze consumer sentiment and define a pricing strategy by locating similar products and categories among your competitors. Beat your eCommerce competitors using a Zalando.de & Zalando.co.uk products dataset to get a complete overview of product pricing, product strategies, and customer reviews. The dataset includes all major data points: Product SKU Currency Timestamp Price Similar products Bought together products Top reviews Rating and more
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The AI Training Dataset Market size was valued at USD 2124.0 million in 2023 and is projected to reach USD 8593.38 million by 2032, exhibiting a CAGR of 22.1 % during the forecasts period. An AI training dataset is a collection of data used to train machine learning models. It typically includes labeled examples, where each data point has an associated output label or target value. The quality and quantity of this data are crucial for the model's performance. A well-curated dataset ensures the model learns relevant features and patterns, enabling it to generalize effectively to new, unseen data. Training datasets can encompass various data types, including text, images, audio, and structured data. The driving forces behind this growth include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Point Arena population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Point Arena. The dataset can be utilized to understand the population distribution of Point Arena by age. For example, using this dataset, we can identify the largest age group in Point Arena.
Key observations
The largest age group in Point Arena, CA was for the group of age 35 to 39 years years with a population of 110 (13.24%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Point Arena, CA was the 25 to 29 years years with a population of 3 (0.36%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Point Arena Population by Age. You can refer the same here
Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.
POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.
Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
Data samples may be downloaded at https://store.poidata.xyz/us
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Commercial Point population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Commercial Point. The dataset can be utilized to understand the population distribution of Commercial Point by age. For example, using this dataset, we can identify the largest age group in Commercial Point.
Key observations
The largest age group in Commercial Point, OH was for the group of age 5 to 9 years years with a population of 324 (10.68%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Commercial Point, OH was the 85 years and over years with a population of 21 (0.69%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Commercial Point Population by Age. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"
DOI: https://doi.org/10.11583/DTU.c.6287841
Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour.
The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course
These datasets consists of labelled trajectories for the purpose of evaluating unsupervised models for detection of abnormal maritime behavior. For unlabelled datasets for training please refer to the collection. Link in Related publications.
The dataset is an example of a SAR event and cannot not be considered representative of a large population of all SAR events.
The dataset consists of a total of 521 trajectories of which 25 is labelled as abnormal. the data is captured on a single day in a specific region. The remaining normal traffic is representative of the traffic during the winter season. The normal traffic in the ROI has a fairly high seasonality related to fishing and leisure sailing traffic.
The data is saved using the pickle format for Python. Each dataset is split into 2 files with naming convention:
datasetInfo_XXX
data_XXX
Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.
The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.
Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl
See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Zero-crossing point detection is necessary to establish a consistent performance in various power system applications. Machine learning models can be used to detect zero-crossing points. A dataset is required to train and test machine learning models in order to detect the zero crossing point. These datasets can be helpful to the researchers who are working on zero-crossing point detection problem using machine learning models. All these datasets are created based on MATLAB simulations. Total 28 datasets developed based on various window size like 5,10,15,20 and noise levels like 10%,20%,30%,40%,50% and 60%. Similarly, total 28 datasets developed based on various window size like 5,10,15,20 and THD levels like 10%,20%,30%,40%,50% and 60%. Also, total 36 datasets prepared based on window size like 5,10,15,20 and combination of noise (10%,30%,60%) and THD (20%,40%,60%). Each dataset consists 4 input features called slope, intercept, correlation and RMSE, and one output label with the values either 0 or 1. 0 represents non zero-crossing point class, whereas 1 represents zero-crossing point class. Datasets Information like number of samples and combinations (Window size, Noise and THD) is available in Data Details excel sheet. These datasets will be useful for faculty, students and researchers who are working on ZCP problem.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).
The variables contained therein are defined as follows:
case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).
patid: a unique patient identifier.
time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,
ncons: number of consultations per month.
period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.
burden: binary variable denoting membership of one of two multimorbidity burden groups.
We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).
Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the second batch of WiFi RSS RTT datasets with LOS conditions we published. Please see https://doi.org/10.5281/zenodo.11558192 for the first release.
We provide three real-world datasets for indoor positioning model selection purpose. We divided the area of interest was divided into discrete grids and labelled them with correct ground truth coordinates and the LoS APs from the grid. The dataset contains WiFi RTT and RSS signal measures and is well separated so that training points and testing points will not overlap. Please find the datasets in the 'data' folder. The datasets contain both WiFi RSS and RTT signal measures with groud truth coordinates label and LOS condition label.
Lecture theatre: This is a entirely LOS scenario with 5 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).
Corridor: This is a entirely NLOS scenario with 4 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).
Office: This is a mixed LOS-NLOS scenario with 5 APs. At least one AP was NLOS for each RP. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).
Collection methodology
The APs utilised were Google WiFi Router AC-1304, the smartphone used to collect the data was Google Pixel 3 with Android 9.
The ground truth coordinates were collected using fixed tile size on the floor and manual post-it note markers.
Only RTT-enabled APs were included in the dataset.
The features of the dataset
The features of the lecture theatre dataset are as follows:
Testbed area: 15 × 14.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 120 Samples per reference point: 60 Number of all data samples: 7,200 Number of training samples: 5,400 Number of testing samples: 1,800 Signal measure: WiFi RTT, WiFi RSS Note: Entirely LOS
The features of the corricor dataset are as follows:
Testbed area: 35 × 6 m2 Grid size: 0.6 × 0.6 m2Number of AP: 4 Number of reference points: 114 Samples per reference point: 60 Number of all data samples: 6,840 Number of training samples: 5,130 Number of testing samples: 1,710 Signal measure: WiFi RTT, WiFi RSS Note: Miexed LOS-NLOS. At least one AP was NLOS for each RP.
The features of the office dataset are as follows:
Testbed area: 18 × 5.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 108 Samples per reference point: 60 Number of all data samples: 6,480 Number of training samples: 4,860 Number of testing samples: 1,620 Signal measure: WiFi RTT, WiFi RSS Note: Entirely NLOS
Dataset explanation
The columns of the dataset are as follows:
Column 'X': the X coordinates of the sample. Column 'Y': the Y coordinates of the sample. Column 'AP1 RTT(mm)', 'AP2 RTT(mm)', ..., 'AP5 RTT(mm)': the RTT measure from corresponding AP at a reference point. Column 'AP1 RSS(dBm)', 'AP2 RSS(dBm)', ..., 'AP5 RSS(dBm)': the RSS measure from corresponding AP at a reference point. Column 'LOS APs': indicating which AP has a LOS to this reference point.
Please note:
The RSS value -200 dBm indicates that the AP is too far away from the current reference point and no signals could be heard from it.
The RTT value 100,000 mm indicates that no signal is received from the specific AP.
Citation request
When using this dataset, please cite the following three items:
Feng, X., Nguyen, K. A., & Zhiyuan, L. (2024). WiFi RSS & RTT dataset with different LOS conditions for indoor positioning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11558792
@article{feng2024wifi, title={A WiFi RSS-RTT indoor positioning system using dynamic model switching algorithm}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, journal={IEEE Journal of Indoor and Seamless Positioning and Navigation}, year={2024}, publisher={IEEE} }@inproceedings{feng2023dynamic, title={A dynamic model switching algorithm for WiFi fingerprinting indoor positioning}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, booktitle={2023 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN)}, pages={1--6}, year={2023}, organization={IEEE} }
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
By Department of Energy [source]
The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.
In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.
Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.
Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!
Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…
Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based
- Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
- Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
- Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
https://brightdata.com/licensehttps://brightdata.com/license
Use our Zillow dataset to collect and analyze data about buying, selling, renting, and financing properties in the United States. The dataset includes over 80 attributes with all major data points about the listing: location, price, listing type, size and number of rooms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A cleaned version of the example dataset is available (in zip file format). Please download it here,
The example dataset was generated in a simulated environment, in which we know the ground truths motion of all the key points.
Three cameras were set up to record the motion: one external camera (generates cam_ext.xxxxxxx.jpeg data), and two feet (left/right) attached cameras (generate cam_r.xxxxxxx.jpeg and cam_l.xxxxxxx.jpeg data). All of them are in the images folder.
Images were generated at 100 fps. You can stack the images together to generate the videos.
The key points annotations are in two formats: 3d_keypoints (absolute 3d location of key points in the global coordinate), projected_2D_keypoints (the location of key points in the projected camera planes, specifically xxxxxxx_cam_ext.csv; xxxxxxx_cam_l.csv; xxxxxxx_cam_r.csv). For instance, the entry: “ankle_li, 140.19, 165.56” represents the 2D coordinates for the left inner ankle key point.
All the frames between camera images and the key points annotations are synchronized. For example, 0001572_cam_l.csv is the projections at frame 1527 into the left foot camera.
This assignment focuses on continuously tracking the key points through the video, specifically for the feet-attached cameras (cam_l and cam_r), where occlusion happens very often. From the suggested model (CoTracker), you will try to figure out a solution that can handle our example data case.
If you have extra efforts (within the 20 hours), you can also try to figure out who to generate the 3D location of the key points from the cam_l and cam_r key points estimations. The 3D key points locations are the important information for our pipeline to estimate the 3D body posture.
The video folder contains example videos of the three camera view angles.
This feature dataset contains the control points used to validate the accuracies of the interpolated water density rasters for the Gulf of Maine. These control points were selected randomly from the water density data points, using Hawth's Create Random Selection Tool. Twenty-five percent of each seasonal bin (for each year and at each depth) were randomly selected and set aside for validation. For example, if there were 1,000 water density data points for the fall (September, October, November) 2003 at 0 meters, then 250 of those points were randomly selected, removed and set aside to assess the accuracy of interpolated surface. The naming convention of the validation point feature class includes the year (or years), the season, and the depth (in meters) it was selected from. So for example, the name: ValidationPoints_1997_2004_Fall_0m would indicate that this point feature class was randomly selected from water density points that were at 0 meters in the fall between 1997-2004. The seasons were defined using the same months as the remote sensing data--namely, Fall = September, October, November; Winter = December, January, February; Spring = March, April, May; and Summer = June, July, August.
This dataset contains data collected within limestone cedar glades at Stones River National Battlefield (STRI) near Murfreesboro, Tennessee. This dataset contains information on soil microbial metabolic response for soil samples obtained from certain quadrat locations (points) within 12 selected cedar glades. This information derives from substrate utilization profiles based on Biolog EcoPlates (Biolog, Inc., Hayward, CA, USA) which were inoculated with soil slurries containing the entire microbial community present in each soil sample. EcoPlates contain 31 sole-carbon substrates (present in triplicate on each plate) and one blank (control) well. Once the microbial community from a soil sample is inoculated onto the plates, the plates are incubated and absorbance readings are taken at intervals.For each quadrat location (point), one soil sample was obtained under sterile conditions, using a trowel wiped with methanol and rinsed with distilled water, and was placed into an autoclaved jar with a tight-fitting lid and placed on ice. Soil samples were transported to lab facilities on ice and immediately refrigerated. Within 24 hours after being removed from the field, soil samples were processed for community level physiological profiling (CLPP) using Biolog EcoPlates. First, for each soil sample three measurements were taken of gravimetric soil water content using a Mettler Toledo HB43 halogen moisture analyzer (Mettler Toledo, Columbus, OH, USA) and the mean of these three SWC measurements was used to calculate the 10-gram dry weight equivalent (DWE) for each soil sample. For each soil sample, a 10-gram DWE of fresh soil was added to 90 milliliters of sterile buffer solution in a 125-milliliter plastic bottle to make the first dilution. Bottles were agitated on a wrist-action shaker for 20 minutes, and a 10-milliliter aliquot was taken from each sample using sterilized pipette tips and added to 90 milliliters of sterile buffer solution to make the second dilution. The bottle containing the second dilution for each sample was agitated for 10 seconds by hand, poured into a sterile tray, and the second dilution was inoculated directly onto Biolog EcoPlates using a sterilized pipette set to deliver 150 microliters into each well. Each plate was immediately covered, placed in a covered box and incubated in the dark at 25 degrees Celcius. Catabolism of each carbon substrate produced a proportional color change response (from the color of the inoculant to dark purple) due to the activity of the redox dye tetrazolium violot (present in all wells including blanks). Plates were read at intervals of 24 hours, 48 hours, 72 hours, 96 hours and 120 hours after inoculation using a Biolog MicroStation plate reader (Biolog, Inc., Hayward, CA, USA) reading absorbance at 590 nanometers.For each soil sample and at each incubation time point, average well color development (AWCD) was calculated according to the equation:AWCD = [Σ (C – R)] / n where C represents the absorbance value of control wells (mean of 3 controls), R is the mean absorbance of the response wells (3 wells per carbon substrate), and n is the number of carbon substrates (31 for EcoPlates). For each soil sample, an incubation curve was constructed using AWCD values from 48 hours to 120 hours, and the area under this incubation curve was calculated. The numeric values contained in the fields of this dataset represent areas under these AWCD incubation curves from 48 hours to 120 hours. Detailed descriptions of experimental design, field data collection procedures, laboratory procedures, and data analysis are presented in Cartwright (2014).References:Cartwright, J. (2014). Soil ecology of a rock outcrop ecosystem: abiotic stresses, soil respiration, and microbial community profiles in limestone cedar glades. Ph.D. dissertation, Tennessee State University.Cofer, M., Walck, J., and Hidayati, S. (2008). Species richness and exotic species invasion in Middle Tennessee cedar glades in relation to abiotic and biotic factors. The Journal of the Torrey Botanical Society, 135(4), 540–553.Garland, J., & Mills, A. (1991). Classification and characterization of heterotrophic microbial communities on the basis of patterns of community-level sole-carbon-source utilization. Applied and environmental microbiology, 57(8), 2351–2359.Garland, J. (1997). Analysis and interpretation of community‐level physiological profiles in microbial ecology. FEMS Microbiology Ecology, 24, 289–300.Hackett, C. A., & Griffiths, B. S. (1997). Statistical analysis of the time-course of Biolog substrate utilization. Journal of Microbiological Methods, 30(1), 63–69.Insam, H. (1997). A new set of substrates proposed for community characterization in environmental samples. In H. Insam & A. Rangger (Eds.), Microbial Communities: Functional versus Structural Approaches(pp. 259–260). New York: Springer.Preston-Mafham, J., Boddy, L., & Randerson, P. F. (2002). Analysis of microbial community functional diversity using sole-carbon-source utilisation profiles - a critique. FEMS microbiology ecology, 42(1), 1–14. doi:10.1111/j.1574-6941.2002.tb00990.x
https://brightdata.com/licensehttps://brightdata.com/license
Use our constantly updated Walmart products dataset to get a complete snapshot of new products, categories, pricing, and consumer reviews. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases: Identify product inventory gaps and increased demand for certain products, analyze consumer sentiment and define a pricing strategy by locating similar products and categories among your competitors. The dataset includes all major data points: product, SKU, GTIN, currency,timestamp, price,a nd more. Get your Walmart dataset today!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1