100+ datasets found
  1. m

    The banksia plot: a method for visually comparing point estimates and...

    • bridges.monash.edu
    • researchdata.edu.au
    txt
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Monash University
    Authors
    Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

  2. Z

    Interpolated data on bioavailable strontium in the southern Trans-Urals,...

    • data.niaid.nih.gov
    Updated Dec 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiseleva, Daria (2024). Interpolated data on bioavailable strontium in the southern Trans-Urals, 2020-2022 version 3.1 (current) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7370065
    Explore at:
    Dataset updated
    Dec 1, 2024
    Dataset provided by
    Epimakhov, Andrey
    Ankusheva, Polina
    Ankushev, Maksim
    Chechushkov, Igor
    Kiseleva, Daria
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ural Mountains
    Description

    Description

    The Interpolated Strontium Values dataset Ver. 3.1 presents the interpolated data of strontium isotopes for the southern Trans-Urals, based on the data gathered in 2020-2022. The current dataset consists of five sets of files for five various interpolations: based on grass, mollusks, soil, and water samples, as well as the average of three (excluding the mollusk dataset). Each of the five sets consists of a CSV file and a KML file where the interpolated values are presented to use with a GIS software (ordinary kriging, 5000 m x 5000 m grid). In addition, two GeoTIFF files are provided for each set for a visual reference.

    Average 5000 m interpolated points.kml / csv: these files contain averaged values of all three sample types.

    Grass 5000 m interpolated points.kml / csv: these files contain data interpolated from the grass sample dataset.

    Mollusks 5000 m interpolated points.kml / csv: these files contain data interpolated from the mollusk sample dataset.

    Soil 5000 m interpolated points.kml / csv: these files contain data interpolated from the soil sample dataset.

    Water 5000 m interpolated points.kml / csv: these files contain data interpolated from the water sample dataset.

    The current version is also supplemented with GeoTiff raster files where the same interpolated values are color-coded. These files can be added to Google Earth or any GIS software together with KML files for better interpretation and comparison.

    Averaged 5000 m interpolation raster.tif: this file contains a raster representing the averaged values of all three sample types.

    Grass 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the grass sample dataset.

    Mollusks 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the mollusk sample dataset.

    Soil 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the soil sample dataset.

    Water 5000 m interpolation raster.tif: this file contains a raster representing the data interpolated from the water sample dataset

    In addition, the cross-validation rasters created during the interpolation process are also provided. They can be used as a visual reference of the interpolation reliability. The grey areas on the raster represent the areas where expected values do not differ from interpolated values for more than 0.001. The red areas represent the areas where the error exceeded 0.001 and, thus, the interpolation is not reliable.

    How to use it?

    The data provided can be used to access interpolated background values of bioavailable strontium in the area of interest. Note that a single value is not a good enough predictor and should never be used as a proxy. Always calculate a mean of 4-6 (or more) nearby values to achieve the best guess possible. Never calculate averages from a single dataset, always rely on cross-validation by comparing data from all five datasets. Check the cross-validation rasters to make sure that the interpolation is reliable for the area of interest.

    References

    The interpolated datasets are based upon the actual measured values published as follows:

    Epimakhov, Andrey; Kisileva, Daria; Chechushkov, Igor; Ankushev, Maksim; Ankusheva, Polina (2022): Strontium isotope ratios (87Sr/86Sr) analysis from various sources the southern Trans-Urals. PANGAEA, https://doi.pangaea.de/10.1594/PANGAEA.950380

    Description of the original dataset of measured strontium isotopic values

    The present dataset contains measurements of bioavailable strontium isotopes (87Sr/86Sr) gathered in the southern Trans-Urals. There are four sample types, such as wormwood (n = 103), leached soil (n = 103), water (n = 101), and freshwater mollusks (n = 80), collected to measure bioavailable strontium isotopes. The analysis of Sr isotopic composition was carried out in the cleanrooms (6 and 7 ISO classes) of the Geoanalitik shared research facilities of the Institute of Geology and Geochemistry, the Ural Branch of the Russian Academy of Sciences (Ekaterinburg). Mollusk shell samples preliminarily cleaned with acetic acid, as well as vegetation samples rinsed with deionized water and ashed, were dissolved by open digestion in concentrated HNO 3 with the addition of H 2 O 2 on a hotplate at 150°C. Water samples were acidified with concentrated nitric acid and filtered. To obtain aqueous leachates, pre-ground soil samples weighing 1 g were taken into polypropylene containers, 10 ml of ultrapure water was added and shaken in for 1 hour, after which they were filtered through membrane cellulose acetate filters with a pore diameter of 0.2 μm. In all samples, the strontium content was determined by ICP-MS (NexION 300S). Then the sample volume corresponding to the Sr content of 600 ng was evaporated on a hotplate at 120°C, and the precipitate was dissolved in 7M HNO 3. Sample solutions were centrifuged at 6000 rpm, and strontium was chromatographically isolated using SR resin (Triskem). The strontium isotopic composition was measured on a Neptune Plus multicollector mass spectrometer with inductively coupled plasma (MC-ICP-MS). To correct mass bias, a combination of bracketing and internal normalization according to the exponential law 88 Sr/ 86 Sr = 8.375209 was used. The results were additionally bracketed using the NIST SRM 987 strontium carbonate reference material using an average deviation from the reference value of 0.710245 for every two samples bracketed between NIST SRM 987 measurements. The long-term reproducibility of the strontium isotopic analysis was evaluated using repeated measurements of NIST SRM 987 during 2020-2022 and yielded 87 Sr/ 86 Sr = 0.71025, 2SD = 0.00012 (104 measurements in two replicates). The within-laboratory standard uncertainty (2σ) obtained for SRM-987 was ± 0.003 %.

  3. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  4. P

    EUCA dataset Dataset

    • paperswithcode.com
    Updated Feb 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weina Jin; Jianyu Fan; Diane Gromala; Philippe Pasquier; Ghassan Hamarneh (2021). EUCA dataset Dataset [Dataset]. https://paperswithcode.com/dataset/euca-dataset
    Explore at:
    Dataset updated
    Feb 3, 2021
    Authors
    Weina Jin; Jianyu Fan; Diane Gromala; Philippe Pasquier; Ghassan Hamarneh
    Description

    EUCA dataset description Associated Paper: EUCA: the End-User-Centered Explainable AI Framework

    Authors: Weina Jin, Jianyu Fan, Diane Gromala, Philippe Pasquier, Ghassan Hamarneh

    Introduction: EUCA dataset is for modelling personalized or interactive explainable AI. It contains 309 data points of 32 end-users' preferences on 12 forms of explanation (including feature-, example-, and rule-based explanations). The data were collected from a user study on 32 layperson participants in the Greater Vancouver city area in 2019-2020. In the user study, the participants (P01-P32) were presented with AI-assisted critical tasks on house price prediction, health status prediction, purchasing a self-driving car, and studying for a biological exam [1]. Within each task and for its given explanation goal [2], the participants selected and rank the explanatory forms [3] that they saw the most suitable.

    1 EUCA_EndUserXAI_ExplanatoryFormRanking.csv

    Column description:

    Index - Participants' number Case - task-explanation goal combination accept to use AI? trust it? - Participants response to whether they will use AI given the task and explanation goal require explanation? - Participants response to the question whether they request an explanation for the AI 1st, 2nd, 3rd, ... - Explanatory form card selection and ranking cards fulfill requirement? - After the card selection, participants were asked whether the selected card combination fulfill their explainability requirement.

    2 EUCA_EndUserXAI_demography.csv

    It contains the participants demographics, including their age, gender, educational background, and their knowledge and attitudes toward AI.

    EUCA dataset zip file for download

    More Context for EUCA Dataset [1] Critical tasks There are four tasks. Task label and their corresponding task titles are: house - Selling your house car - Buying an autonomous driving vehicle health - Personal health decision bird - Learning bird species

    Please refer to EUCA quantatative data analysis report for the storyboard of the tasks and explanation goals presented in the user study.

    [2] Explanation goal End-users may have different goals/purposes to check an explanation from AI. The EUCA dataset includes the following 11 explanation goals, with its [label] in the dataset, full name and description

    [trust] Calibrate trust: trust is a key to establish human-AI decision-making partnership. Since users can easily distrust or overtrust AI, it is important to calibrate the trust to reflect the capabilities of AI systems.

    [safe] Ensure safety: users need to ensure safety of the decision consequences.

    [bias] - Detect bias: users need to ensure the decision is impartial and unbiased.

    [unexpect] Resolve disagreement with AI: the AI prediction is unexpected and there are disagreements between users and AI.

    [expected] - Expected: the AI's prediction is expected and aligns with users' expectations.

    [differentiate] Differentiate similar instances: due to the consequences of wrong decisions, users sometimes need to discern similar instances or outcomes. For example, a doctor differentiates whether the diagnosis is a benign or malignant tumor.

    [learning] Learn: users need to gain knowledge, improve their problem-solving skills, and discover new knowledge

    [control] Improve: users seek causal factors to control and improve the predicted outcome.

    [communicate] Communicate with stakeholders: many critical decision-making processes involve multiple stakeholders, and users need to discuss the decision with them.

    [report] Generate reports: users need to utilize the explanations to perform particular tasks such as report production. For example, a radiologist generates a medical report on a patient's X-ray image.

    [multi] Trade-off multiple objectives: AI may be optimized on an incomplete objective while the users seek to fulfill multiple objectives in real-world applications. For example, a doctor needs to ensure a treatment plan is effective as well as has acceptable patient adherence. Ethical and legal requirements may also be included as objectives.

    [3] Explanatory form The following 12 explanatory forms are end-user-friendly, i.e.: no technical knowledge is required for the end-user to interpret the explanation.

    Feature-Based Explanation Feature Attribution - fa
    Note: for tasks that has image as input data, the feature attribution is denoted by the following two cards: ir: important regions (a.k.a. heat map or saliency map) irc: important regions with their feature contribution percentage

    Feature Shape - fs

    Feature Interaction - fi

    Example-Based Explanation

    Similar Example - se Typical Example - te

    Counterfactual Example - ce

    Note: for contractual example, there were two visual variations used in the user study: cet: counterfactual example with transition from one example to the counterfactual one ceh: counterfactual example with the contrastive feature highlighted

    Rule-Based Explanation

    Rule - rt Decision Tree - dt

    Decision Flow - df

    Supplementary Information

    Input Output Performance Dataset - prior (output prediction with prior distribution of each class in the training set)

    Note: occasionally there is a wild card, which means the participant draw the card by themselves. It is indicated as 'wc'.

    For visual examples of each explanatory form card, please refer to the Explanatory_form_labels.pdf document.

    Link to the details on users' requirements on different explanatory forms

    Code and report for EUCA data quantatitve analysis

    EUCA data analysis code EUCA quantatative data analysis report

    EUCA data citation @article{jin2021euca, title={EUCA: the End-User-Centered Explainable AI Framework}, author={Weina Jin and Jianyu Fan and Diane Gromala and Philippe Pasquier and Ghassan Hamarneh}, year={2021}, eprint={2102.02437}, archivePrefix={arXiv}, primaryClass={cs.HC} }

  5. Cosmos-Transfer1-7B-Sample-AV-Data-Example

    • huggingface.co
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). Cosmos-Transfer1-7B-Sample-AV-Data-Example [Dataset]. https://huggingface.co/datasets/nvidia/Cosmos-Transfer1-7B-Sample-AV-Data-Example
    Explore at:
    Dataset updated
    Mar 16, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cosmos-Transfer1-7B-Sample-AV-Data-Example

    Cosmos | Code | Paper | Paper Website

      Dataset Description:
    

    This dataset contains 10 sample data points intended to help users better utilize our Cosmos-Transfer1-7B-Sample-AV model. It includes HD Map annotations and LiDAR data, with no personally identifiable information such as faces or license plates. This dataset is intended for research and development only.

      Dataset Owner(s):
    

    NVIDIA

      Dataset Creation… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/Cosmos-Transfer1-7B-Sample-AV-Data-Example.
    
  6. Zalando Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2024). Zalando Dataset [Dataset]. https://brightdata.com/products/datasets/zalando
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 17, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our Zalando DE & UK products dataset to get a complete snapshot of new products, categories, pricing, and consumer reviews. Depending on your needs, you may purchase the entire dataset or a customized subset. Popular use cases: Identify product inventory gaps and increased demand for certain products, analyze consumer sentiment and define a pricing strategy by locating similar products and categories among your competitors. Beat your eCommerce competitors using a Zalando.de & Zalando.co.uk products dataset to get a complete overview of product pricing, product strategies, and customer reviews. The dataset includes all major data points: Product SKU Currency Timestamp Price Similar products Bought together products Top reviews Rating and more

  7. A

    AI Training Dataset Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Jun 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). AI Training Dataset Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-market-5881
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jun 6, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The AI Training Dataset Market size was valued at USD 2124.0 million in 2023 and is projected to reach USD 8593.38 million by 2032, exhibiting a CAGR of 22.1 % during the forecasts period. An AI training dataset is a collection of data used to train machine learning models. It typically includes labeled examples, where each data point has an associated output label or target value. The quality and quantity of this data are crucial for the model's performance. A well-curated dataset ensures the model learns relevant features and patterns, enabling it to generalize effectively to new, unseen data. Training datasets can encompass various data types, including text, images, audio, and structured data. The driving forces behind this growth include:

  8. N

    Point Arena, CA Age Group Population Dataset: A Complete Breakdown of Point...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Point Arena, CA Age Group Population Dataset: A Complete Breakdown of Point Arena Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/453fb33e-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Point Arena, California
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Point Arena population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Point Arena. The dataset can be utilized to understand the population distribution of Point Arena by age. For example, using this dataset, we can identify the largest age group in Point Arena.

    Key observations

    The largest age group in Point Arena, CA was for the group of age 35 to 39 years years with a population of 110 (13.24%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Point Arena, CA was the 25 to 29 years years with a population of 3 (0.36%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Point Arena is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Point Arena total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Point Arena Population by Age. You can refer the same here

  9. d

    US Restaurant POI dataset with metadata

    • datarade.ai
    .csv
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset authored and provided by
    Geolytica
    Area covered
    United States of America
    Description

    Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

    This is our process flow:

    Our machine learning systems continuously crawl for new POI data
    Our geoparsing and geocoding calculates their geo locations
    Our categorization systems cleanup and standardize the datasets
    Our data pipeline API publishes the datasets on our data store
    

    A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

    POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

    We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

    Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

    Data samples may be downloaded at https://store.poidata.xyz/us

  10. N

    Commercial Point, OH Age Group Population Dataset: A Complete Breakdown of...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Commercial Point, OH Age Group Population Dataset: A Complete Breakdown of Commercial Point Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa853932-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Commercial Point, Ohio
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Commercial Point population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Commercial Point. The dataset can be utilized to understand the population distribution of Commercial Point by age. For example, using this dataset, we can identify the largest age group in Commercial Point.

    Key observations

    The largest age group in Commercial Point, OH was for the group of age 5 to 9 years years with a population of 324 (10.68%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Commercial Point, OH was the 85 years and over years with a population of 21 (0.69%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Commercial Point is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Commercial Point total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Commercial Point Population by Age. You can refer the same here

  11. d

    Labelled evaluation datasets of AIS Trajectories from Danish Waters for...

    • data.dtu.dk
    bin
    Updated Jul 12, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen (2023). Labelled evaluation datasets of AIS Trajectories from Danish Waters for Abnormal Behavior Detection [Dataset]. http://doi.org/10.11583/DTU.21511815.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Kristoffer Vinther Olesen; Line Katrine Harder Clemmensen; Anders Nymark Christensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This item is part of the collection "AIS Trajectories from Danish Waters for Abnormal Behavior Detection"

    DOI: https://doi.org/10.11583/DTU.c.6287841

    Using Deep Learning for detection of maritime abnormal behaviour in spatio temporal trajectories is a relatively new and promising application. Open access to the Automatic Identification System (AIS) has made large amounts of maritime trajectories publically avaliable. However, these trajectories are unannotated when it comes to the detection of abnormal behaviour. The lack of annotated datasets for abnormality detection on maritime trajectories makes it difficult to evaluate and compare suggested models quantitavely. With this dataset, we attempt to provide a way for researchers to evaluate and compare performance.
    We have manually labelled trajectories which showcase abnormal behaviour following an collision accident. The annotated dataset consists of 521 data points with 25 abnormal trajectories. The abnormal trajectories cover amoung other; Colliding vessels, vessels engaged in Search-and-Rescue activities, law enforcement, and commercial maritime traffic forced to deviate from the normal course

    These datasets consists of labelled trajectories for the purpose of evaluating unsupervised models for detection of abnormal maritime behavior. For unlabelled datasets for training please refer to the collection. Link in Related publications.

    The dataset is an example of a SAR event and cannot not be considered representative of a large population of all SAR events.

    The dataset consists of a total of 521 trajectories of which 25 is labelled as abnormal. the data is captured on a single day in a specific region. The remaining normal traffic is representative of the traffic during the winter season. The normal traffic in the ROI has a fairly high seasonality related to fishing and leisure sailing traffic.

    The data is saved using the pickle format for Python. Each dataset is split into 2 files with naming convention:

    datasetInfo_XXX
    data_XXX

    Files named "data_XXX" contains the extracted trajectories serialized sequentially one at a time and must be read as such. Please refer to provided utility functions for examples. Files named "datasetInfo" contains Metadata related to the dataset and indecies at which trajectories begin in "data_XXX" files.

    The data are sequences of maritime trajectories defined by their; timestamp, latitude/longitude position, speed, course, and unique ship identifer MMSI. In addition, the dataset contains metadata related to creation parameters. The dataset has been limited to a specific time period, ship types, moving AIS navigational statuses, and filtered within an region of interest (ROI). Trajectories were split if exceeding an upper limit and short trajectories were discarded. All values are given as metadata in the dataset and used in the naming syntax.

    Naming syntax: data_AIS_Custom_STARTDATE_ENDDATE_SHIPTYPES_MINLENGTH_MAXLENGTH_RESAMPLEPERIOD.pkl

    See datasheet for more detailed information and we refer to provided utility functions for examples on how to read and plot the data.

  12. m

    ZCP Dataset - Distorted Sinusoidal Signal

    • data.mendeley.com
    Updated Feb 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Venkataramana Veeramsetty (2022). ZCP Dataset - Distorted Sinusoidal Signal [Dataset]. http://doi.org/10.17632/d2hs6zt8gw.1
    Explore at:
    Dataset updated
    Feb 15, 2022
    Authors
    Venkataramana Veeramsetty
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Zero-crossing point detection is necessary to establish a consistent performance in various power system applications. Machine learning models can be used to detect zero-crossing points. A dataset is required to train and test machine learning models in order to detect the zero crossing point. These datasets can be helpful to the researchers who are working on zero-crossing point detection problem using machine learning models. All these datasets are created based on MATLAB simulations. Total 28 datasets developed based on various window size like 5,10,15,20 and noise levels like 10%,20%,30%,40%,50% and 60%. Similarly, total 28 datasets developed based on various window size like 5,10,15,20 and THD levels like 10%,20%,30%,40%,50% and 60%. Also, total 36 datasets prepared based on window size like 5,10,15,20 and combination of noise (10%,30%,60%) and THD (20%,40%,60%). Each dataset consists 4 input features called slope, intercept, correlation and RMSE, and one output label with the values either 0 or 1. 0 represents non zero-crossing point class, whereas 1 represents zero-crossing point class. Datasets Information like number of samples and combinations (Window size, Noise and THD) is available in Data Details excel sheet. These datasets will be useful for faculty, students and researchers who are working on ZCP problem.

  13. m

    Example Stata syntax and data construction for negative binomial time series...

    • data.mendeley.com
    Updated Nov 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sarah Price (2022). Example Stata syntax and data construction for negative binomial time series regression [Dataset]. http://doi.org/10.17632/3mj526hgzx.2
    Explore at:
    Dataset updated
    Nov 2, 2022
    Authors
    Sarah Price
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2).

    The variables contained therein are defined as follows:

    case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases).

    patid: a unique patient identifier.

    time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer,

    ncons: number of consultations per month.

    period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point.

    burden: binary variable denoting membership of one of two multimorbidity burden groups.

    We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do).

    Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.

  14. Z

    WiFi RSS & RTT dataset with different LOS conditions for indoor positioning

    • data.niaid.nih.gov
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nguyen, Khuong An (2024). WiFi RSS & RTT dataset with different LOS conditions for indoor positioning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11558791
    Explore at:
    Dataset updated
    Jun 11, 2024
    Dataset provided by
    Nguyen, Khuong An
    Feng, Xu
    Luo, Zhiyuan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second batch of WiFi RSS RTT datasets with LOS conditions we published. Please see https://doi.org/10.5281/zenodo.11558192 for the first release.

    We provide three real-world datasets for indoor positioning model selection purpose. We divided the area of interest was divided into discrete grids and labelled them with correct ground truth coordinates and the LoS APs from the grid. The dataset contains WiFi RTT and RSS signal measures and is well separated so that training points and testing points will not overlap. Please find the datasets in the 'data' folder. The datasets contain both WiFi RSS and RTT signal measures with groud truth coordinates label and LOS condition label.

    Lecture theatre: This is a entirely LOS scenario with 5 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

    Corridor: This is a entirely NLOS scenario with 4 APs. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

    Office: This is a mixed LOS-NLOS scenario with 5 APs. At least one AP was NLOS for each RP. 60 scans of WiFi RTT and RSS signal measures were collected at each reference point (RP).

    Collection methodology

    The APs utilised were Google WiFi Router AC-1304, the smartphone used to collect the data was Google Pixel 3 with Android 9.

    The ground truth coordinates were collected using fixed tile size on the floor and manual post-it note markers.

    Only RTT-enabled APs were included in the dataset.

    The features of the dataset

    The features of the lecture theatre dataset are as follows:

    Testbed area: 15 × 14.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 120 Samples per reference point: 60 Number of all data samples: 7,200 Number of training samples: 5,400 Number of testing samples: 1,800 Signal measure: WiFi RTT, WiFi RSS Note: Entirely LOS

    The features of the corricor dataset are as follows:

    Testbed area: 35 × 6 m2 Grid size: 0.6 × 0.6 m2Number of AP: 4 Number of reference points: 114 Samples per reference point: 60 Number of all data samples: 6,840 Number of training samples: 5,130 Number of testing samples: 1,710 Signal measure: WiFi RTT, WiFi RSS Note: Miexed LOS-NLOS. At least one AP was NLOS for each RP.

    The features of the office dataset are as follows:

    Testbed area: 18 × 5.5 m2 Grid size: 0.6 × 0.6 m2Number of AP: 5 Number of reference points: 108 Samples per reference point: 60 Number of all data samples: 6,480 Number of training samples: 4,860 Number of testing samples: 1,620 Signal measure: WiFi RTT, WiFi RSS Note: Entirely NLOS

    Dataset explanation

    The columns of the dataset are as follows:

    Column 'X': the X coordinates of the sample. Column 'Y': the Y coordinates of the sample. Column 'AP1 RTT(mm)', 'AP2 RTT(mm)', ..., 'AP5 RTT(mm)': the RTT measure from corresponding AP at a reference point. Column 'AP1 RSS(dBm)', 'AP2 RSS(dBm)', ..., 'AP5 RSS(dBm)': the RSS measure from corresponding AP at a reference point. Column 'LOS APs': indicating which AP has a LOS to this reference point.

    Please note:

    The RSS value -200 dBm indicates that the AP is too far away from the current reference point and no signals could be heard from it.

    The RTT value 100,000 mm indicates that no signal is received from the specific AP.

    Citation request

    When using this dataset, please cite the following three items:

    Feng, X., Nguyen, K. A., & Zhiyuan, L. (2024). WiFi RSS & RTT dataset with different LOS conditions for indoor positioning [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11558792

    @article{feng2024wifi, title={A WiFi RSS-RTT indoor positioning system using dynamic model switching algorithm}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, journal={IEEE Journal of Indoor and Seamless Positioning and Navigation}, year={2024}, publisher={IEEE} }@inproceedings{feng2023dynamic, title={A dynamic model switching algorithm for WiFi fingerprinting indoor positioning}, author={Feng, Xu and Nguyen, Khuong An and Luo, Zhiyuan}, booktitle={2023 13th International Conference on Indoor Positioning and Indoor Navigation (IPIN)}, pages={1--6}, year={2023}, organization={IEEE} }

  15. Energy Consumption of United States Over Time

    • kaggle.com
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Energy Consumption of United States Over Time [Dataset]. https://www.kaggle.com/datasets/thedevastator/unlocking-the-energy-consumption-of-united-state
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Energy Consumption of United States Over Time

    Building Energy Data Book

    By Department of Energy [source]

    About this dataset

    The Building Energy Data Book (2011) is an invaluable resource for gaining insight into the current state of energy consumption in the buildings sector. This dataset provides comprehensive data on residential, commercial and industrial building energy consumption, construction techniques, building technologies and characteristics. With this resource, you can get an in-depth understanding of how energy is used in various types of buildings - from single family homes to large office complexes - as well as its impact on the environment. The BTO within the U.S Department of Energy's Office of Energy Efficiency and Renewable Energy developed this dataset to provide a wealth of knowledge for researchers, policy makers, engineers and even everyday observers who are interested in learning more about our built environment and its energy usage patterns

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides comprehensive information regarding energy consumption in the buildings sector of the United States. It contains a number of key variables which can be used to analyze and explore the relations between energy consumption and building characteristics, technologies, and construction. The data is provided in both CSV format as well as tabular format which can make it helpful for those who prefer to use programs like Excel or other statistical modeling software.

    In order to get started with this dataset we've developed a guide outlining how to effectively use it for your research or project needs.

    • Understand what's included: Before you start analyzing the data, you should read through the provided documentation so that you fully understand what is included in the datasets. You'll want to be aware of any potential limitations or requirements associated with each type of data point so that your results are valid and reliable when drawing conclusions from them.

    • Clean up any outliers: You may need to take some time upfront investigating suspicious outliers within your dataset before using it in any further analyses — otherwise, they can skew results down the road if not dealt with first-hand! Furthermore, they could also make complex statistical modeling more difficult as well since they artificially inflate values depending on their magnitude within each example data point (i.e., one outlier could affect an entire model’s prior distributions). Missing values should also be accounted for too since these may not always appear obvious at first glance when reviewing a table or graphical representation - but accurate statistics must still be obtained either way no matter how messy things seem!

    • Exploratory data analysis: After cleaning up your dataset you'll want to do some basic exploring by visualizing different types of summaries like boxplots, histograms and scatter plots etc.. This will give you an initial case into what trends might exist within certain demographic/geographic/etc.. regions & variables which can then help inform future predictive models when needed! Additionally this step will highlight any clear discontinuous changes over time due over-generalization (if applicable), making sure predictors themselves don’t become part noise instead contributing meaningful signals towards overall effect predictions accuracy etc…

    • Analyze key metrics & observations: Once exploratory analyses have been carried out on rawsamples post-processing steps are next such as analyzing metrics such ascorrelations amongst explanatory functions; performing significance testing regression models; imputing missing/outlier values and much more depending upon specific project needs at hand… Additionally – interpretation efforts based

    Research Ideas

    • Creating an energy efficiency rating system for buildings - Using the dataset, an organization can develop a metric to rate the energy efficiency of commercial and residential buildings in a standardized way.
    • Developing targeted campaigns to raise awareness about energy conservation - Analyzing data from this dataset can help organizations identify areas of high energy consumption and create targeted campaigns and incentives to encourage people to conserve energy in those areas.
    • Estimating costs associated with upgrading building technologies - By evaluating various trends in building technologies and their associated costs, decision-makers can determine the most cost-effective option when it comes time to upgrade their structures' energy efficiency...
  16. Zillow Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Zillow Datasets [Dataset]. https://brightdata.com/products/datasets/zillow
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 19, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our Zillow dataset to collect and analyze data about buying, selling, renting, and financing properties in the United States. The dataset includes over 80 attributes with all major data points about the listing: location, price, listing type, size and number of rooms.

  17. ML assignment example dataset

    • zenodo.org
    zip
    Updated May 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huawei Wang; Huawei Wang (2025). ML assignment example dataset [Dataset]. http://doi.org/10.5281/zenodo.15318018
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 1, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Huawei Wang; Huawei Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2025
    Description
    1. A cleaned version of the example dataset is available (in zip file format). Please download it here,

    1. The example dataset was generated in a simulated environment, in which we know the ground truths motion of all the key points.

    1. Three cameras were set up to record the motion: one external camera (generates cam_ext.xxxxxxx.jpeg data), and two feet (left/right) attached cameras (generate cam_r.xxxxxxx.jpeg and cam_l.xxxxxxx.jpeg data). All of them are in the images folder.

    1. Images were generated at 100 fps. You can stack the images together to generate the videos.

    1. The key points annotations are in two formats: 3d_keypoints (absolute 3d location of key points in the global coordinate), projected_2D_keypoints (the location of key points in the projected camera planes, specifically xxxxxxx_cam_ext.csv; xxxxxxx_cam_l.csv; xxxxxxx_cam_r.csv). For instance, the entry: “ankle_li, 140.19, 165.56” represents the 2D coordinates for the left inner ankle key point.

    1. All the frames between camera images and the key points annotations are synchronized. For example, 0001572_cam_l.csv is the projections at frame 1527 into the left foot camera.

    1. This assignment focuses on continuously tracking the key points through the video, specifically for the feet-attached cameras (cam_l and cam_r), where occlusion happens very often. From the suggested model (CoTracker), you will try to figure out a solution that can handle our example data case.

    1. If you have extra efforts (within the 20 hours), you can also try to figure out who to generate the 3D location of the key points from the cam_l and cam_r key points estimations. The 3D key points locations are the important information for our pipeline to estimate the 3D body posture.

    1. The video folder contains example videos of the three camera view angles.

  18. d

    Gulf of Maine - Control Points Used to Validate the Accuracies of the...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated May 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact, Custodian) (2025). Gulf of Maine - Control Points Used to Validate the Accuracies of the Interpolated Water Density Rasters [Dataset]. https://catalog.data.gov/dataset/gulf-of-maine-control-points-used-to-validate-the-accuracies-of-the-interpolated-water-density-1
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    (Point of Contact, Custodian)
    Area covered
    Gulf of Maine, Maine
    Description

    This feature dataset contains the control points used to validate the accuracies of the interpolated water density rasters for the Gulf of Maine. These control points were selected randomly from the water density data points, using Hawth's Create Random Selection Tool. Twenty-five percent of each seasonal bin (for each year and at each depth) were randomly selected and set aside for validation. For example, if there were 1,000 water density data points for the fall (September, October, November) 2003 at 0 meters, then 250 of those points were randomly selected, removed and set aside to assess the accuracy of interpolated surface. The naming convention of the validation point feature class includes the year (or years), the season, and the depth (in meters) it was selected from. So for example, the name: ValidationPoints_1997_2004_Fall_0m would indicate that this point feature class was randomly selected from water density points that were at 0 meters in the fall between 1997-2004. The seasons were defined using the same months as the remote sensing data--namely, Fall = September, October, November; Winter = December, January, February; Spring = March, April, May; and Summer = June, July, August.

  19. d

    Data from: Average Well Color Development (AWCD) data based on Community...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Average Well Color Development (AWCD) data based on Community Level Physiological Profiling (CLPP) of soil samples from 120 point locations within limestone cedar glades at Stones River National Battlefield near Murfreesboro, Tennessee [Dataset]. https://catalog.data.gov/dataset/average-well-color-development-awcd-data-based-on-community-level-physiological-profiling-
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Murfreesboro, Tennessee
    Description

    This dataset contains data collected within limestone cedar glades at Stones River National Battlefield (STRI) near Murfreesboro, Tennessee. This dataset contains information on soil microbial metabolic response for soil samples obtained from certain quadrat locations (points) within 12 selected cedar glades. This information derives from substrate utilization profiles based on Biolog EcoPlates (Biolog, Inc., Hayward, CA, USA) which were inoculated with soil slurries containing the entire microbial community present in each soil sample. EcoPlates contain 31 sole-carbon substrates (present in triplicate on each plate) and one blank (control) well. Once the microbial community from a soil sample is inoculated onto the plates, the plates are incubated and absorbance readings are taken at intervals.For each quadrat location (point), one soil sample was obtained under sterile conditions, using a trowel wiped with methanol and rinsed with distilled water, and was placed into an autoclaved jar with a tight-fitting lid and placed on ice. Soil samples were transported to lab facilities on ice and immediately refrigerated. Within 24 hours after being removed from the field, soil samples were processed for community level physiological profiling (CLPP) using Biolog EcoPlates. First, for each soil sample three measurements were taken of gravimetric soil water content using a Mettler Toledo HB43 halogen moisture analyzer (Mettler Toledo, Columbus, OH, USA) and the mean of these three SWC measurements was used to calculate the 10-gram dry weight equivalent (DWE) for each soil sample. For each soil sample, a 10-gram DWE of fresh soil was added to 90 milliliters of sterile buffer solution in a 125-milliliter plastic bottle to make the first dilution. Bottles were agitated on a wrist-action shaker for 20 minutes, and a 10-milliliter aliquot was taken from each sample using sterilized pipette tips and added to 90 milliliters of sterile buffer solution to make the second dilution. The bottle containing the second dilution for each sample was agitated for 10 seconds by hand, poured into a sterile tray, and the second dilution was inoculated directly onto Biolog EcoPlates using a sterilized pipette set to deliver 150 microliters into each well. Each plate was immediately covered, placed in a covered box and incubated in the dark at 25 degrees Celcius. Catabolism of each carbon substrate produced a proportional color change response (from the color of the inoculant to dark purple) due to the activity of the redox dye tetrazolium violot (present in all wells including blanks). Plates were read at intervals of 24 hours, 48 hours, 72 hours, 96 hours and 120 hours after inoculation using a Biolog MicroStation plate reader (Biolog, Inc., Hayward, CA, USA) reading absorbance at 590 nanometers.For each soil sample and at each incubation time point, average well color development (AWCD) was calculated according to the equation:AWCD = [Σ (C – R)] / n where C represents the absorbance value of control wells (mean of 3 controls), R is the mean absorbance of the response wells (3 wells per carbon substrate), and n is the number of carbon substrates (31 for EcoPlates). For each soil sample, an incubation curve was constructed using AWCD values from 48 hours to 120 hours, and the area under this incubation curve was calculated. The numeric values contained in the fields of this dataset represent areas under these AWCD incubation curves from 48 hours to 120 hours. Detailed descriptions of experimental design, field data collection procedures, laboratory procedures, and data analysis are presented in Cartwright (2014).References:Cartwright, J. (2014). Soil ecology of a rock outcrop ecosystem: abiotic stresses, soil respiration, and microbial community profiles in limestone cedar glades. Ph.D. dissertation, Tennessee State University.Cofer, M., Walck, J., and Hidayati, S. (2008). Species richness and exotic species invasion in Middle Tennessee cedar glades in relation to abiotic and biotic factors. The Journal of the Torrey Botanical Society, 135(4), 540–553.Garland, J., & Mills, A. (1991). Classification and characterization of heterotrophic microbial communities on the basis of patterns of community-level sole-carbon-source utilization. Applied and environmental microbiology, 57(8), 2351–2359.Garland, J. (1997). Analysis and interpretation of community‐level physiological profiles in microbial ecology. FEMS Microbiology Ecology, 24, 289–300.Hackett, C. A., & Griffiths, B. S. (1997). Statistical analysis of the time-course of Biolog substrate utilization. Journal of Microbiological Methods, 30(1), 63–69.Insam, H. (1997). A new set of substrates proposed for community characterization in environmental samples. In H. Insam & A. Rangger (Eds.), Microbial Communities: Functional versus Structural Approaches(pp. 259–260). New York: Springer.Preston-Mafham, J., Boddy, L., & Randerson, P. F. (2002). Analysis of microbial community functional diversity using sole-carbon-source utilisation profiles - a critique. FEMS microbiology ecology, 42(1), 1–14. doi:10.1111/j.1574-6941.2002.tb00990.x

  20. Walmart Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Walmart Datasets [Dataset]. https://brightdata.com/products/datasets/walmart
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Use our constantly updated Walmart products dataset to get a complete snapshot of new products, categories, pricing, and consumer reviews. You may purchase the entire dataset or a customized subset, depending on your needs. Popular use cases: Identify product inventory gaps and increased demand for certain products, analyze consumer sentiment and define a pricing strategy by locating similar products and categories among your competitors. The dataset includes all major data points: product, SKU, GTIN, currency,timestamp, price,a nd more. Get your Walmart dataset today!

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2

The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
Oct 15, 2024
Dataset provided by
Monash University
Authors
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1

Search
Clear search
Close search
Google apps
Main menu