8 datasets found

Smartwatch Purchase Data
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aayush Chourasiya (2022). Smartwatch Purchase Data [Dataset]. https://www.kaggle.com/datasets/albedo0/smartwatch-purchase-data/versions/2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aayush Chourasiya
Description
Disclaimer: This is an artificially generated data using a python script based on arbitrary assumptions listed down.

The data consists of 100,000 examples of training data and 10,000 examples of test data, each representing a user who may or may not buy a smart watch.

----- Version 1 -------

trainingDataV1.csv, testDataV1.csv or trainingData.csv, testData.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. hour: The hour of the day (integer, 0-23) 1. weekend: A boolean indicating whether it is the weekend (True or False) 1. The data also includes a label for each user indicating whether they are likely to buy a smart watch or not (string, "yes" or "no"). The label is determined based on the following arbitrary conditions: - If the user is divorced and a random number generated by the script is less than 0.4, the label is "no" (i.e., assuming 40% of divorcees are not likely to buy a smart watch) - If it is the weekend and a random number generated by the script is less than 1.3, the label is "yes". (i.e., assuming sales are 30% more likely to occur on weekends) - If the user is male and under 30 with an income over 75,000, the label is "yes". - If the user is female and 30 or over with an income over 100,000, the label is "yes". Otherwise, the label is "no".

The training data is intended to be used to build and train a classification model, and the test data is intended to be used to evaluate the performance of the trained model.

Following Python script was used to generate this dataset

import random import csv # Set the number of examples to generate numExamples = 100000 # Generate the training data with open("trainingData.csv", "w", newline="") as csvfile: fieldnames = ["age", "income", "gender", "maritalStatus", "hour", "weekend", "buySmartWatch"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() for i in range(numExamples): age = random.randint(18, 70) income = random.randint(25000, 200000) gender = random.choice(["male", "female"]) maritalStatus = random.choice(["single", "married", "divorced"]) hour = random.randint(0, 23) weekend = random.choice([True, False]) # Randomly assign the label based on some arbitrary conditions # assuming 40% of divorcees won't buy a smart watch if maritalStatus == "divorced" and random.random() < 0.4: buySmartWatch = "no" # assuming sales are 30% more likely to occur on weekends. elif weekend == True and random.random() < 1.3: buySmartWatch = "yes" elif gender == "male" and age < 30 and income > 75000: buySmartWatch = "yes" elif gender == "female" and age >= 30 and income > 100000: buySmartWatch = "yes" else: buySmartWatch = "no" writer.writerow({ "age": age, "income": income, "gender": gender, "maritalStatus": maritalStatus, "hour": hour, "weekend": weekend, "buySmartWatch": buySmartWatch })

----- Version 2 -------

trainingDataV2.csv, testDataV2.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. educationLevel: The education level of the user (string, "high school", "associate's degree", "bachelor's degree", "master's degree", or "doctorate") 1. occupation: The occupation of the user (string, "tech worker", "manager", "executive", "sales", "customer service", "creative", "manual labor", "healthcare", "education", "government", "unemployed", or "student") 1. familySize: The number of people in the user's family (integer, 1-5) 1. fitnessInterest: A boolean indicating whether the user is interested in fitness (True or False) 1. priorSmartwatchOwnership: A boolean indicating whether the user has owned a smartwatch in the past (True or False) 1. hour: The hour of the day when the user was surveyed (integer, 0-23) 1. weekend: A boolean indicating whether the user was surveyed on a weekend (True or False) 1. buySmartWatch: A boolean indicating whether the user purchased a smartwatch (True or False)

Python script used to generate the data:

import random import csv # Set the number of examples to generate numExamples = 100000 with open("t...
Rescaled Fashion-MNIST dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15187793
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled Fashion-MNIST dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
Z
Integrated Agent-based Modelling and Simulation of Transportation Demand and...
data.niaid.nih.gov
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sprei, Frances (2024). Integrated Agent-based Modelling and Simulation of Transportation Demand and Mobility Patterns in Sweden [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10648077
Explore at:
Dataset updated
Jun 19, 2024
Dataset provided by
Ghosh, Kaniska
Dhamal, Swapnil
Tozluoğlu, Çağlar
Sprei, Frances
Liao, Yuan
Yeh, Sonia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Sweden
Description
About

The Synthetic Sweden Mobility (SySMo) model provides a simplified yet statistically realistic microscopic representation of the real population of Sweden. The agents in this synthetic population contain socioeconomic attributes, household characteristics, and corresponding activity plans for an average weekday. This agent-based modelling approach derives the transportation demand from the agents’ planned activities using various transport modes (e.g., car, public transport, bike, and walking).

This open data repository contains four datasets:

(1) Synthetic Agents,

(2) Activity Plans of the Agents,

(3) Travel Trajectories of the Agents, and

(4) Road Network (EPSG: 3006)

(OpenStreetMap data were retrieved on August 28, 2023, from https://download.geofabrik.de/europe.html, and GTFS data were retrieved on September 6, 2023 from https://samtrafiken.se/)

The database can serve as input to assess the potential impacts of new transportation technologies, infrastructure changes, and policy interventions on the mobility patterns of the Swedish population.

Methodology

This dataset contains statistically simulated 10.2 million agents representing the population of Sweden, their socio-economic characteristics and the activity plan for an average weekday. For preparing data for the MATSim simulation, we randomly divided all the agents into 10 batches. Each batch's agents are then simulated in MATSim using the multi-modal network combining road networks and public transit data in Sweden using the package pt2matsim (https://github.com/matsim-org/pt2matsim).

The agents' daily activity plans along with the road network serve as the primary inputs in the MATSim environment which ensures iterative replanning while aiming for a convergence on optimal activity plans for all the agents. Subsequently, the individual mobility trajectories of the agents from the MATSim simulation are retrieved.

The activity plans of the individual agents extracted from the MATSim simulation output data are then further processed. All agents with negative utility score and negative activity time corresponding to at least one activity are filtered out as the ‘infeasible’ agents. The dataset ‘Synthetic Agents’ contains all synthetic agents regardless of their ‘feasibility’ (0=excluded & 1=included in plans and trajectories). In the other datasets, only agents with feasible activity plans are included.

The simulation setup adheres to the MATSim 13.0 benchmark scenario, with slight adjustments. The strategy for replanning integrates BestScore (60%), TimeAllocationMutator (30%), and ReRoute (10%)— the percentages denote the proportion of agents utilizing these strategies. In each iteration of the simulation, the agents adopt these strategies to adjust their activity plans. The "BestScore" strategy retains the plan with the highest score from the previous iteration, selecting the most successful strategy an agent has employed up until that point. The "TimeAllocationMutator" modifies the end times of activities by introducing random shifts within a specified range, allowing for the exploration of different schedules. The "ReRoute" strategy enables agents to alter their current routes, potentially optimizing travel based on updated information or preferences. These strategies are detailed further in W. Axhausen et al. (2016) work, which provides comprehensive insights into their implementation and impact within the context of transport simulation modeling.

Data Description

(1) Synthetic Agents

This dataset contains all agents in Sweden and their socioeconomic characteristics.

The attribute ‘feasibility’ has two categories: feasible agents (73%), and infeasible agents (27%). Infeasible agents are agents with negative utility score and negative activity time corresponding to at least one activity.

File name: 1_syn_pop_all.parquet

Column

Description

Data type

Unit

PId

Agent ID

Integer

-

Deso Zone code of Demographic statistical areas (DeSO)1

String

kommun

Municipality code

Integer

marital

Marital Status (single/ couple/ child)

String

sex

Gender (0 = Male, 1 = Female)

Integer

age

Age

Integer

HId

A unique identifier for households

Integer

HHtype

Type of households (single/ couple/ other)

String

HHsize

Number of people living in the households

Integer

num_babies

Number of children less than six years old in the household

Integer

employment Employment Status (0 = Not Employed, 1 = Employed)

Integer

studenthood Studenthood Status (0 = Not Student, 1 = Student)

Integer

income_class Income Class (0 = No Income, 1 = Low Income, 2 = Lower-middle Income, 3 = Upper-middle Income, 4 = High Income)

Integer

num_cars Number of cars owned by an individual

Integer

HHcars Number of cars in the household

Integer

feasibility

Status of the individual (1=feasible, 0=infeasible)

Integer

1 https://www.scb.se/vara-tjanster/oppna-data/oppna-geodata/deso--demografiska-statistikomraden/

(2) Activity Plans of the Agents

The dataset contains the car agents’ (agents that use cars on the simulated day) activity plans for a simulated average weekday.

File name: 2_plans_i.parquet, i = 0, 1, 2, ..., 8, 9. (10 files in total)

Column

Description

Data type

Unit

act_purpose

Activity purpose (work/ home/ school/ other)

String

-

PId

Agent ID

Integer

-

act_end

End time of activity (0:00:00 – 23:59:59)

String

hour:minute:seco

nd

act_id

Activity index of each agent

Integer

-

mode

Transport mode to reach the activity location

String

-

POINT_X

Coordinate X of activity location (SWEREF99TM)

Float

metre

POINT_Y

Coordinate Y of activity location (SWEREF99TM)

Float

metre

dep_time

Departure time (0:00:00 – 23:59:59)

String

hour:minute:seco

nd

score

Utility score of the simulation day as obtained from MATSim

Float

-

trav_time

Travel time to reach the activity location

String

hour:minute:seco

nd

trav_time_min

Travel time in decimal minute

Float

minute

act_time

Activity duration in decimal minute

Float

minute

distance

Travel distance between the origin and the destination

Float

km

speed

Travel speed to reach the activity location

Float

km/h

(3) Travel Trajectories of the Agents

This dataset contains the driving trajectories of all the agents on the road network, and the public transit vehicles used by these agents, including buses, ferries, trams etc. The files are produced by MATSim simulations and organised into 10 *.parquet’ files (representing different batches of simulation) corresponding to each plan file.

File name: 3_events_i.parquet, i = 0, 1, 2, ..., 8, 9. (10 files in total)

Column

Description

Data type

Unit

time

Time in second in a simulation day (0-86399)

Integer

second

type

Event type defined by MATSim simulation*

String

person

Agent ID

Integer

link

Nearest road link consistent with the road network

String

vehicle

Vehicle ID identical to person

Integer

from_node

Start node of the link

Integer

to_node

End node of the link

Integer

One typical episode of MATSim simulation events: Activity ends (actend) -> Agent’s vehicle enters traffic (vehicle enters traffic) -> Agent’s vehicle moves from previous road segment to its next connected one (left link) -> Agent’s vehicle leaves traffic for activity (vehicle leaves traffic) -> Activity starts (actstart)

(4) Road Network

This dataset contains the road network.

File name: 4_network.shp

Column

Description

Data type

Unit

length

The length of road link

Float

metre

freespeed

Free speed

Float

km/h

capacity

Number of vehicles

Integer

permlanes

Number of lanes

Integer

oneway

Whether the segment is one-way (0=no, 1=yes)

Integer

modes

Transport mode

String

from_node

Start node of the link

Integer

to_node

End node of the link

Integer

geometry

LINESTRING (SWEREF99TM)

geometry

metre

Additional Notes

This research is funded by the RISE Research Institutes of Sweden, the Swedish Research Council for Sustainable Development (Formas, project number 2018-01768), and Transport Area of Advance, Chalmers.

Contributions

YL designed the simulation, analyzed the simulation data, and, along with CT, executed the simulation. CT, SD, FS, and SY conceptualized the model (SySMo), with CT and SD further developing the model to produce agents and their activity plans. KG wrote the data document. All authors reviewed, edited, and approved the final document.
Dataset used in the publication entitled "Application of machine learning to...
zenodo.org
data.niaid.nih.gov
bin, txt
Updated Jan 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Biaobiao Yang; Biaobiao Yang; Valentin Vassilev-Galindo; Valentin Vassilev-Galindo; Javier Llorca; Javier Llorca (2024). Dataset used in the publication entitled "Application of machine learning to assess the influence of microstructure on twin nucleation in Mg alloys" [Dataset]. http://doi.org/10.5281/zenodo.10225600
Explore at:
bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10225600
Dataset updated
Jan 31, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Biaobiao Yang; Biaobiao Yang; Valentin Vassilev-Galindo; Valentin Vassilev-Galindo; Javier Llorca; Javier Llorca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Documentation for the Dataset used in the publication entitled "Application of machine learning to assess the influence of microstructure on twin nucleation in Mg alloys"
** These datasets comprise the 2D EBSD data acquired in the Mg-1Al (at.%) alloy and AZ31 Mg alloy, analyzed with MTEX 7.0 software. **
** More details about the experimental techniques can be found in the publication "Biaobiao Yang, Valentin Vassilev-Galindo, Javier Llorca, Application of machine learning to assess the influence of microstructure on twin nucleation in Mg alloys. npj Computational Materials, 2024." **

1. AZ31_ML.xlsx
- Description: Both twin and grain data were acquired by EBSD from AZ31 Mg sample before and after deformation at the same area
- Number of grains: 2640 (rows == grains) corresponding to three samples deformed in different orientations: S0, S45, and S90
- Number of analyzed variables (features): 31 (columns == grain characteristics)

- Variable description by columns:
1- (Twinned) - type: boolean
Description: Indicates if the grain twinned or not after deformation
0: non-twinned grain
1: twinned grain
2- (Orientation) - type: numerical (integer)
Description: The loading (tensile) direction with respect to the c axis of lattice
3- (Strain_level) - type: numerical (float)
Description: The maximum strain level after deformation
4- (Grain_size) - type: numerical (float)
Description: The equivalent circle diameter (in micrometers) of the grain before deformation.
5- (Triple_points) - type: numerical (integer)
Description: The number of triple points of the grain before deformation
6- (Near_edge) - type: boolean
Description: Indicates if the grain is located near the edge of the 2D EBSD or not. This feature was used to filter out from the final dataset the grains near the edge of the sample. Hence, only those entries with Near_edge value of 0 were used to train and test the machine learning models.
0: not near the EBSD edge
1: near the EBSD edge
7-12- (T_SF*) - type: numerical (float)
Description: The twinning Schmid factor based on the loading condition, orientation of parent grain and twin variants information.
T_SF1: The highest Schmid factor of extension twinning
T_SF2: The 2nd highest ...
T_SF3: 3rd
T_SF4: 4th
T_SF5: 5th
T_SF6: The lowest Schmid factor of extension twinning
13-15- (S_SF*) - type: numerical (float)
Description: The Schmid factor for basal slip based on the loading condition, orientation of parent grain, and slip system information. Only the basal slip system is considered because it is the dominant deformation slip system in Mg during deformation.
S_SF1: The highest Schmid factor of basal slip
S_SF2: The second highest or the middle Schmid factor of basal slip
S_SF3: The lowest Schmid factor of basal slip
16- (Neighbor_grain_n) - type: numerical (integer)
Description: The number of neighbors of the grain before deformation.
17-19- (B-b_m) - type: numerical (float)
Description: The Luster-Morris geometric compatibility factor (m') between the basal slip systems of the grain and its neighbors. Although there are 3 possible basal slip systems, only the one with the highest Schmid factor was considered to compute m′. Only maximum, minimum, and mean values were included in the dataset.
(Max_B-b_m): The highest basal - basal m' between the grain and its neighbors
(Min_B-b_m): The lowest basal - basal m' between the grain and its neighbors
(Mean_B-b_m): The average basal - basal m' between the grain and its neighbors
20-22- (B-t_m) - type: numerical (float)
Description: The Luster-Morris geometric compatibility factor (m') between the 6 extension twin variants of the grain and the basal slip systems of its neighbors. Although there are 3 possible basal slip systems, only the one with the highest Schmid factor was considered to compute m'. However, all 6 twinning variants have been considered, given that slip induced twinning is a localized process. Only maximum, minimum, and mean values were included in the dataset.
(Max_B-t_m): The highest basal - twin m' between the grain and its neighbors
(Min_B-t_m): The lowest basal - twin m' between the grain and its neighbors
(Mean_B-t_m): The average basal - twin m' between the grain and its neighbors
23-25- (GB_misang) - type: numerical (float)
Description: The misorientation angle (in º) between the grain and its neighbors. In fact, disorientation angle is used for the misorientation angle. Only maximum, minimum, and mean values were included in the dataset.
(Max_GBmisang): The highest GB misorientation angle between the grain and its neighbors
(Min_GBmisang): The lowest GB misorientation angle between the grain and its neighbors
(Mean_GBmisang): The average GB misorientation angle between the grain and its neighbors
26-28- (delta_Gs) - type: numerical (float)
Description: Grain size difference (in micrometers) between a given grain and its neighbors. The grain size is calculated as the diameter of a circular grain with the same area of the grain. Only maximum, minimum, and mean values were included in the dataset.
(Max_deltaGs): The highest grain size difference between the grain and its neighbors
(Min_deltaGs): The smallest grain size difference between the grain and its neighbors
(Mean_deltaGs): The average grain size difference between the grain and its neighbors
29-31- (delta_BSF) - type: numerical (float)
Description: The difference in the basal slip Schmid factor between a given grain and its neighbors. Only the highest basal slip Schmid factor is considered. Only maximum, minimum, and mean values were included in the dataset.
(Max_deltaBSF): The highest basal SF difference between the grain and its neighbors
(Min_deltaBSF): The smallest basal SF difference between the grain and its neighbors
(Mean_deltaBSF): The average basal SF difference between the grain and its neighbors

2. Mg1Al_ML.xlsx
- Description: Both twin and grain data were acquired by EBSD from Mg-1Al (at.%) sample before and after deformation at the same area
- Number of grains: 1496 (rows == grains) corresponding to two true strain levels: ~6%, and ~10%.
- Number of analyzed variables (features): 31 (columns == grain characteristics)

- Variable descriptions by columns are the same as those of AZ31_ML.xlsx
Three-dimensional dataset of hydrating cement paste (CEM I Ladce, 273 m^2/kg...
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michal Hlobil; Michal Hlobil; Ivana Kumpová; Ivana Kumpová (2023). Three-dimensional dataset of hydrating cement paste (CEM I Ladce, 273 m^2/kg Blaine, w/c=0.50) in TIFF format [Dataset]. http://doi.org/10.5281/zenodo.7275174
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7275174
Dataset updated
Jan 23, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michal Hlobil; Michal Hlobil; Ivana Kumpová; Ivana Kumpová
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A detailed description of this dataset can be found in https://doi.org/10.1016/j.dib.2023.108903 .

This dataset contains a collection of digitized three-dimensional hardened cement paste microstructures obtained from X-ray micro-computed tomography, screened after approx. 1, 2, 3, 4, 7, 14, and 28 days of elapsed hydration at 20˚C in saturated conditions. Each paste specimen had a cylindrical shape (with a diameter of ~1 mm) and was screened at a designated time (as specified in the file name, e.g. “t23hrs”=23 hours of elapsed hydration) and finally saved as an uncompressed and unprocessed *.tif greyscale image data file in 16-bit image depth (as unsigned integers) using a little-endian byte sequence.

The dataset contains two sets of images:

- “full-sized” digital images stored in a three-dimensional voxel-based matrix with a fixed size of 1100x1100x1100 voxels, denoted as “CEM_I_Ladce_*” in the file name; each file size amounts to ~2.5 GB and contains the whole screened specimen with a variable voxel size in the range 1.0913 − 1.1174 µm depending on the particular specimen (as specified in the file name, e.g. “1d1174um”=1.1174 µm/voxel)

- smaller image subvolumes, denoted as Region Of Interest (ROI), extracted from the interior of the full-sized specimen from an arbitrary location, and denoted as “filteredROI_*” in the file name; this cropped ROI has a cubic shape and stores a three-dimensional voxel-based matrix with a fixed size of 500x500x500 µm³ constituted by a variable voxel count (given the fluctuating voxel size for each specimen, see above). Both the exact voxel count (i.e. three-dimensional matrix dimensions) and voxel size are further specified in each file name. A sequence of imaging filters was sequentially applied to this ROI to further enhance the contrast among the different microstructural phases, see 10.1016/j.cemconcomp.2022.104798 for details.

Note that the same dataset stored in *raw format is available from https://doi.org/10.5281/zenodo.7193819
SAFARI 2000 Tree Ring Data, Mongu, Zambia, Dry Season 2000 - Dataset - NASA...
data.staging.idas-ds1.appdat.jsc.nasa.gov
data.nasa.gov
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). SAFARI 2000 Tree Ring Data, Mongu, Zambia, Dry Season 2000 - Dataset - NASA Open Data Portal [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/safari-2000-tree-ring-data-mongu-zambia-dry-season-2000-0a3a3
Explore at:
Dataset updated
Mar 20, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
Mongu, Zambia
Description
This data set contains tree ring data from three sites located about 25 km of the meteorological station at Mongu, Zambia. Data from about 50 individual trees are reported. In addition, chronologies (or site mean curves) that better represent common influences (e.g., in this study, the climatic signal) were developed for each site based on the individual data (Trouet, 2004; Trouet et al., 2001). The series covers a maximum of 46 years, although most series do not extend longer than 30 years. The data were collected during the SAFARI 2000 Dry Season Field Campaign of August 2000.Ten to 23 samples were taken at each site. Brachystegia bakeriana was sampled at site 1, and Brachystegia spiciformis at sites 2 and 3. The vegetation at all sites underwent primitive harvesting for subsistence earlier the same year, thus samples could be taken from freshly cut trees and no living trees were cut. At all sites, samples consisted of full stem discs. Where possible, samples were taken at breast height (1.3 m) or slightly lower. Growth ring widths were measured to the nearest 0.01 mm using LINTAB equipment and TSAP software (Rinn and Jakel, 1997). Four radii per sample disc were measured. Cross-dating and response function analyses were performed by routine dendrochronological techniques. There are two files for each site, one containing integer values representing tree ring widths (raw data), and the other containing standardized values (chronologies), for each year. The data are stored as ASCII table files in comma-separated-value (.csv) format, with column headers.
m
Spreadsheet Implementations for Linking Multi-Level Contribution Margin...
data.mendeley.com
Updated Apr 26, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Gutiérrez (2021). Spreadsheet Implementations for Linking Multi-Level Contribution Margin Accounting with Multi-Level Fixed-Charge Problems [Dataset]. http://doi.org/10.17632/s6pswx23yx.4
Explore at:
Unique identifier
https://doi.org/10.17632/s6pswx23yx.4
Dataset updated
Apr 26, 2021
Authors
Michael Gutiérrez
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
This site provides the data and spreadsheet implementations for linking multi-level contribution margin accounting as a subsystem in cost accounting with several versions of a multi-level fixed-charge problem (MLFCP), the latter based on the optimization approach in operations research. For the data, plausible fictitious values have been assumed taking into consideration the calculation principles in cost accounting where applicable. They include resource-related data, market-related data, and data from cost accounting. While the deterministic version of the data does not consider uncertainty, the stochastic/robust versions assume probability distributions and rank correlations for part of the data.

Spreadsheets

The data and the above-mentioned linkage are implemented in three spreadsheet files, including versions for deterministic optimization, stochastic optimization, and robust optimization:

MLFCP deterministic.xlsx

MLFCP stochastic.xlsx

MLFCP robust.xlsx

For a detailed description of the spreadsheet implementations and information on the software required to use them, see the associated data article published in Data in Brief. For the conceptual framework, mathematical formulation of the optimization model (MLFCP), findings, and discussion, see the associated research article published in Heliyon. (The links to both articles can be found on this page).

Big Picture

Furthermore, an overview (“big picture”) of the data flows between the various worksheets is provided in three main versions which correspond to the deterministic, stochastic, and robust versions of the MLFCP:

Overview of data flows - deterministic

Overview of data flows - stochastic (with three sub-variants)

Overview of data flows - robust

Within each version/sub-variant of the overview, two file formats (PDF and PNG) are available. These are oversize graphics; please scale up appropriately to see the details.

(Remark on version numbers and dates: The version numbers reported within the files might be lower than the version number of the entire dataset in case particular files remain unchanged in an update. The same might analogously apply to the dates.)
z
Geospatial Dataset of GNSS Anomalies and Political Violence Events
zenodo.org
csv
Updated Jun 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugene Pik; Eugene Pik; João S. D. Garcia; João S. D. Garcia; Matthew Berra; Timothy Smith; Ibrahim Kocaman; Ibrahim Kocaman; Matthew Berra; Timothy Smith (2025). Geospatial Dataset of GNSS Anomalies and Political Violence Events [Dataset]. http://doi.org/10.5281/zenodo.15665065
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15665065
Dataset updated
Jun 14, 2025
Dataset provided by
Zenodo
Authors
Eugene Pik; Eugene Pik; João S. D. Garcia; João S. D. Garcia; Matthew Berra; Timothy Smith; Ibrahim Kocaman; Ibrahim Kocaman; Matthew Berra; Timothy Smith
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 14, 2025
Description
Geospatial Dataset of GNSS Anomalies and Political Violence Events

Overview

The Geospatial Dataset of GNSS Anomalies and Political Violence Events is a collection of data that integrates aircraft flight information, GNSS (Global Navigation Satellite System) anomalies, and political violence events from the ACLED (Armed Conflict Location & Event Data Project) database.

Dataset Files

The dataset consists of three CSV files:

Daily_GNSS_Anomalies_and_ACLED-2023-V1.csv

Description: Contains all grids and dates that had aircraft traffic during 2023.

Number of Records: 6,777,228

Purpose: Provides a complete view of aircraft movements and associated data, including grids without any GNSS anomalies.

Daily_GNSS_Anomalies_and_ACLED-2023-V2.csv

Description: A filtered version of V1, including only the grids and dates where GNSS anomalies (jumps or gaps) were reported.

Number of Records: 718,237

Purpose: Focuses on areas and times with GNSS anomalies for targeted analysis.

Monthly_GNSS_Anomalies_and_ACLED-2023-V9.csv

Description: Contains aggregated monthly data for each grid cell, combining GNSS anomalies and ACLED political violence events. Summarizes aircraft traffic, anomaly counts, and conflict activity at a monthly resolution.

Number of Records: 25,770

Purpose: Enables temporal trend analysis and spatial correlation studies between GNSS interference and political violence, using reduced data volume suitable for modeling and visualization.

Data Fields: Daily_GNSS_Anomalies_and_ACLED-2023-V1.csv and Daily_GNSS_Anomalies_and_ACLED-2023-V2.csv

grid_id

Description: Unique identifier for a grid cell on Earth measuring 0.5 degrees latitude by 0.5 degrees longitude.

Format: String combining latitude and longitude (e.g., -10.0_-36.0).

day

Description: Date of the recorded data.

Format: YYYY-MM-DD (e.g., 2023-03-28).

geometry

Description: Polygon coordinates of the grid cell in Well-Known Text (WKT) format.

Format: POLYGON((longitude latitude, ...)) (e.g., POLYGON((-36.0 -10.0, -35.5 -10.0, -35.5 -9.5, -36.0 -9.5, -36.0 -10.0))).

flights

Description: Number of aircraft flights that passed through the grid on that day.

Format: Integer (e.g., 28).

GPS_jumps

Description: Number of reported GNSS "jump" anomalies (possible spoofing incidents) in the grid on that day.

Format: Integer (e.g., 1).

GPS_gaps

Description: Number of reported GNSS "gap" anomalies, indicating gaps in aircraft routes, in the grid on that day.

Format: Integer (e.g., 0).

gaps_density

Description: Density of GNSS gaps, calculated as the number of gaps divided by the number of flights.

Format: Decimal (e.g., 0).

jumps_density

Description: Density of GNSS jumps, calculated as the number of jumps divided by the number of flights.

Format: Decimal (e.g., 0.035714286).

event_id_cnty

Description: ACLED event ID corresponding to political violence events in the grid on that day.

Format: String (e.g., BRA69267).

disorder_type

Description: Type of disorder as classified by ACLED (e.g., "Political violence").

Format: String.

event_type

Description: General category of the event according to ACLED (e.g., "Violence against civilians").

Format: String.

sub_event_type

Description: Specific subtype of the event as per ACLED classification (e.g., "Attack").

Format: String.

acled_count

Description: Number of ACLED events in the grid on that day.

Format: Integer (e.g., 1).

acled_flag

Description: Indicator of ACLED event presence in the grid on that day (0 for no events, 1 for one or more events).

Format: Integer (0 or 1).

Data Fields: Monthly_GNSS_Anomalies_and_ACLED-2023-V9.csv

The file contains monthly aggregated GNSS anomaly and ACLED event data per grid cell. The structure and meaning of each field are detailed below:

grid_id

Description: Unique identifier for a grid cell on Earth measuring 0.5° latitude by 0.5° longitude.

Format: String combining latitude and longitude (e.g., -0.5_-79.0).

year_month

Description: Month and year of the aggregated data.

Format: String in Mon-YY format (e.g., Jan-23).

geometry

Description: Polygon coordinates of the grid cell in Well-Known Text (WKT) format.

Format: POLYGON((longitude latitude, ...))
(e.g., POLYGON((-79.0 -0.5, -78.5 -0.5, -78.5 0.0, -79.0 0.0, -79.0 -0.5))).

flights

Description: Total number of aircraft flights that passed through the grid cell during the month.

Format: Integer (e.g., 1230).

GPS_jumps

Description: Total number of GNSS "jump" anomalies (possible spoofing events) in the grid cell during the month.

Format: Integer (e.g., 13).

GPS_gaps

Description: Total number of GNSS "gap" anomalies, indicating interruptions in aircraft routes, during the month.

Format: Integer (e.g., 0).

event_id_cnty

Description: Semicolon-separated list of ACLED event IDs associated with the grid cell during the month.

Format: String (e.g., ECU3151;ECU3158;ECU3150).

disorder_type

Description: Semicolon-separated list of disorder types (e.g., "Political violence", "Demonstrations") reported by ACLED in that grid cell during the month.

Format: String.

event_type

Description: Semicolon-separated list of high-level ACLED event types (e.g., "Riots", "Protests").

Format: String.

sub_event_type

Description: Semicolon-separated list of detailed subtypes of ACLED events (e.g., "Mob violence", "Armed clash").

Format: String.

acled_count

Description: Total number of ACLED conflict events in the grid cell during the month.

Format: Integer (e.g., 2).

acled_flag

Description: Conflict presence indicator: 1 if any ACLED event occurred in the grid cell during the month, otherwise 0.

Format: Integer (0 or 1).

gaps_density

Description: Monthly density of GNSS gaps, calculated as GPS_gaps / flights.

Format: Decimal (e.g., 0.0).

jumps_density

Description: Monthly density of GNSS jumps, calculated as GPS_jumps / flights.

Format: Decimal (e.g., 0.0106).

Data Sources

GNSS Anomalies Data:

Calculated from ADS-B (Automatic Dependent Surveillance-Broadcast) messages obtained via the OpenSky Network's Trino database.

GNSS anomalies include "jumps" (potential spoofing incidents) and "gaps" (interruptions in aircraft route data).

Political Violence Events Data:

Sourced from the ACLED database, which provides detailed information on political violence and protest events worldwide.

Temporal and Spatial Coverage

Temporal Coverage:

From January 1, 2023, to December 31, 2023.

Daily records provide temporal granularity for time-series analysis.

Spatial Coverage:

Global coverage with grid cells measuring 0.5 degrees latitude by 0.5 degrees longitude.

Each grid cell represents an area on Earth's surface, facilitating spatial
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Aayush Chourasiya (2022). Smartwatch Purchase Data [Dataset]. https://www.kaggle.com/datasets/albedo0/smartwatch-purchase-data/versions/2

Smartwatch Purchase Data

Smartwatch sales prediction: An artificial dataset of 100,000 customer profiles

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Dec 30, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Aayush Chourasiya

Description

Disclaimer: This is an artificially generated data using a python script based on arbitrary assumptions listed down.

The data consists of 100,000 examples of training data and 10,000 examples of test data, each representing a user who may or may not buy a smart watch.

----- Version 1 -------

trainingDataV1.csv, testDataV1.csv or trainingData.csv, testData.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. hour: The hour of the day (integer, 0-23) 1. weekend: A boolean indicating whether it is the weekend (True or False) 1. The data also includes a label for each user indicating whether they are likely to buy a smart watch or not (string, "yes" or "no"). The label is determined based on the following arbitrary conditions: - If the user is divorced and a random number generated by the script is less than 0.4, the label is "no" (i.e., assuming 40% of divorcees are not likely to buy a smart watch) - If it is the weekend and a random number generated by the script is less than 1.3, the label is "yes". (i.e., assuming sales are 30% more likely to occur on weekends) - If the user is male and under 30 with an income over 75,000, the label is "yes". - If the user is female and 30 or over with an income over 100,000, the label is "yes". Otherwise, the label is "no".

The training data is intended to be used to build and train a classification model, and the test data is intended to be used to evaluate the performance of the trained model.

Following Python script was used to generate this dataset

import random
import csv

# Set the number of examples to generate
numExamples = 100000

# Generate the training data
with open("trainingData.csv", "w", newline="") as csvfile:
  fieldnames = ["age", "income", "gender", "maritalStatus", "hour", "weekend", "buySmartWatch"]
  writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

  writer.writeheader()

  for i in range(numExamples):
    age = random.randint(18, 70)
    income = random.randint(25000, 200000)
    gender = random.choice(["male", "female"])
    maritalStatus = random.choice(["single", "married", "divorced"])
    hour = random.randint(0, 23)
    weekend = random.choice([True, False])

    # Randomly assign the label based on some arbitrary conditions
    # assuming 40% of divorcees won't buy a smart watch
    if maritalStatus == "divorced" and random.random() < 0.4:
      buySmartWatch = "no"
    # assuming sales are 30% more likely to occur on weekends.
    elif weekend == True and random.random() < 1.3:
      buySmartWatch = "yes"
    elif gender == "male" and age < 30 and income > 75000:
      buySmartWatch = "yes"
    elif gender == "female" and age >= 30 and income > 100000:
      buySmartWatch = "yes"
    else:
      buySmartWatch = "no"

    writer.writerow({
      "age": age,
      "income": income,
      "gender": gender,
      "maritalStatus": maritalStatus,
      "hour": hour,
      "weekend": weekend,
      "buySmartWatch": buySmartWatch
    })

----- Version 2 -------

trainingDataV2.csv, testDataV2.csv The data includes the following features for each user: 1. age: The age of the user (integer, 18-70) 1. income: The income of the user (integer, 25,000-200,000) 1. gender: The gender of the user (string, "male" or "female") 1. maritalStatus: The marital status of the user (string, "single", "married", or "divorced") 1. educationLevel: The education level of the user (string, "high school", "associate's degree", "bachelor's degree", "master's degree", or "doctorate") 1. occupation: The occupation of the user (string, "tech worker", "manager", "executive", "sales", "customer service", "creative", "manual labor", "healthcare", "education", "government", "unemployed", or "student") 1. familySize: The number of people in the user's family (integer, 1-5) 1. fitnessInterest: A boolean indicating whether the user is interested in fitness (True or False) 1. priorSmartwatchOwnership: A boolean indicating whether the user has owned a smartwatch in the past (True or False) 1. hour: The hour of the day when the user was surveyed (integer, 0-23) 1. weekend: A boolean indicating whether the user was surveyed on a weekend (True or False) 1. buySmartWatch: A boolean indicating whether the user purchased a smartwatch (True or False)

Python script used to generate the data:

import random
import csv

# Set the number of examples to generate
numExamples = 100000

with open("t...

Clear search

Close search

Google apps

Main menu

Smartwatch Purchase Data

Rescaled Fashion-MNIST dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Integrated Agent-based Modelling and Simulation of Transportation Demand and...

String

Integer

String

Integer

Integer

Integer

String

Integer

Integer

Integer

Integer

Integer

Integer

Integer

Integer

Dataset used in the publication entitled "Application of machine learning to...

Three-dimensional dataset of hydrating cement paste (CEM I Ladce, 273 m^2/kg...

SAFARI 2000 Tree Ring Data, Mongu, Zambia, Dry Season 2000 - Dataset - NASA...

Spreadsheet Implementations for Linking Multi-Level Contribution Margin...

Geospatial Dataset of GNSS Anomalies and Political Violence Events

Smartwatch Purchase Data

Smartwatch sales prediction: An artificial dataset of 100,000 customer profiles