100+ datasets found

Z
ANN development + final testing datasets
data.niaid.nih.gov
resodate.org
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
Explore at:
Dataset updated
Jan 24, 2020
Authors
Authors
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
File name definitions:

'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

'...v_175_250...' - dataset for velocity range [175, 250] m/s

'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

input values in 'IN' sheet

target values in 'TARGET' sheet

Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

Check reference below (to be added when the paper is published)

https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Fused Image dataset for convolutional neural Network-based crack Detection...
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song (2023). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Dataset]. http://doi.org/10.5281/zenodo.6383044
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6383044
Dataset updated
Apr 20, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shanglian Zhou; Shanglian Zhou; Carlos Canchila; Carlos Canchila; Wei Song; Wei Song
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The “Fused Image dataset for convolutional neural Network-based crack Detection” (FIND) is a large-scale image dataset with pixel-level ground truth crack data for deep learning-based crack segmentation analysis. It features four types of image data including raw intensity image, raw range (i.e., elevation) image, filtered range image, and fused raw image. The FIND dataset consists of 2500 image patches (dimension: 256x256 pixels) and their ground truth crack maps for each of the four data types.

The images contained in this dataset were collected from multiple bridge decks and roadways under real-world conditions. A laser scanning device was adopted for data acquisition such that the captured raw intensity and raw range images have pixel-to-pixel location correspondence (i.e., spatial co-registration feature). The filtered range data were generated by applying frequency domain filtering to eliminate image disturbances (e.g., surface variations, and grooved patterns) from the raw range data [1]. The fused image data were obtained by combining the raw range and raw intensity data to achieve cross-domain feature correlation [2,3]. Please refer to [4] for a comprehensive benchmark study performed using the FIND dataset to investigate the impact from different types of image data on deep convolutional neural network (DCNN) performance.

If you share or use this dataset, please cite [4] and [5] in any relevant documentation.

In addition, an image dataset for crack classification has also been published at [6].

References:

[1] Shanglian Zhou, & Wei Song. (2020). Robust Image-Based Surface Crack Detection Using Range Data. Journal of Computing in Civil Engineering, 34(2), 04019054. https://doi.org/10.1061/(asce)cp.1943-5487.0000873

[2] Shanglian Zhou, & Wei Song. (2021). Crack segmentation through deep convolutional neural networks and heterogeneous image fusion. Automation in Construction, 125. https://doi.org/10.1016/j.autcon.2021.103605

[3] Shanglian Zhou, & Wei Song. (2020). Deep learning–based roadway crack classification with heterogeneous image data fusion. Structural Health Monitoring, 20(3), 1274-1293. https://doi.org/10.1177/1475921720948434

[4] Shanglian Zhou, Carlos Canchila, & Wei Song. (2023). Deep learning-based crack segmentation for civil infrastructure: data types, architectures, and benchmarked performance. Automation in Construction, 146. https://doi.org/10.1016/j.autcon.2022.104678

[5] (This dataset) Shanglian Zhou, Carlos Canchila, & Wei Song. (2022). Fused Image dataset for convolutional neural Network-based crack Detection (FIND) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6383044

[6] Wei Song, & Shanglian Zhou. (2020). Laser-scanned roadway range image dataset (LRRD). Laser-scanned Range Image Dataset from Asphalt and Concrete Roadways for DCNN-based Crack Classification, DesignSafe-CI. https://doi.org/10.17603/ds2-bzv3-nc78

Credit Card Eligibility Data: Determining Factors

kaggle.com

zip

Updated May 18, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Rohit Sharma (2024). Credit Card Eligibility Data: Determining Factors [Dataset]. https://www.kaggle.com/datasets/rohit265/credit-card-eligibility-data-determining-factors

Explore at:

zip(303227 bytes)Available download formats

Dataset updated

May 18, 2024

Authors

Rohit Sharma

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Description of the Credit Card Eligibility Data: Determining Factors

The Credit Card Eligibility Dataset: Determining Factors is a comprehensive collection of variables aimed at understanding the factors that influence an individual's eligibility for a credit card. This dataset encompasses a wide range of demographic, financial, and personal attributes that are commonly considered by financial institutions when assessing an individual's suitability for credit.

Each row in the dataset represents a unique individual, identified by a unique ID, with associated attributes ranging from basic demographic information such as gender and age, to financial indicators like total income and employment status. Additionally, the dataset includes variables related to familial status, housing, education, and occupation, providing a holistic view of the individual's background and circumstances.

Variable	Description
ID	An identifier for each individual (customer).
Gender	The gender of the individual.
Own_car	A binary feature indicating whether the individual owns a car.
Own_property	A binary feature indicating whether the individual owns a property.
Work_phone	A binary feature indicating whether the individual has a work phone.
Phone	A binary feature indicating whether the individual has a phone.
Email	A binary feature indicating whether the individual has provided an email address.
Unemployed	A binary feature indicating whether the individual is unemployed.
Num_children	The number of children the individual has.
Num_family	The total number of family members.
Account_length	The length of the individual's account with a bank or financial institution.
Total_income	The total income of the individual.
Age	The age of the individual.
Years_employed	The number of years the individual has been employed.
Income_type	The type of income (e.g., employed, self-employed, etc.).
Education_type	The education level of the individual.
Family_status	The family status of the individual.
Housing_type	The type of housing the individual lives in.
Occupation_type	The type of occupation the individual is engaged in.
Target	The target variable for the classification task, indicating whether the individual is eligible for a credit card or not (e.g., Yes/No, 1/0).

Researchers, analysts, and financial institutions can leverage this dataset to gain insights into the key factors influencing credit card eligibility and to develop predictive models that assist in automating the credit assessment process. By understanding the relationship between various attributes and credit card eligibility, stakeholders can make more informed decisions, improve risk assessment strategies, and enhance customer targeting and segmentation efforts.

This dataset is valuable for a wide range of applications within the financial industry, including credit risk management, customer relationship management, and marketing analytics. Furthermore, it provides a valuable resource for academic research and educational purposes, enabling students and researchers to explore the intricate dynamics of credit card eligibility determination.

Amphibian metamorphosis assays- biological & histopathological data and...
catalog.data.gov
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2025). Amphibian metamorphosis assays- biological & histopathological data and range finding studies [Dataset]. https://catalog.data.gov/dataset/amphibian-metamorphosis-assays-biological-histopathological-data-and-range-finding-studies
Explore at:
Dataset updated
Jun 15, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Five chemicals [2-ethylhexyl 4-hydroxybenzoate (2-EHHB), 4-nonylphenol-branched (4-NP), 4-tert-octylphenol (4-OP), benzyl butyl phthalate (BBP) and dibutyl phthalate (DBP) were subjected to a 21-day Amphibian Metamorphosis Assay (AMA) following OCSPP 890.1100 test guidelines. The selected chemicals exhibited estrogenic or androgenic bioactivity in high throughput screening data obtained from US EPA ToxCast models. Xenopus laevis larvae were exposed nominally to each chemical at 3.6, 10.9, 33.0 and 100 µg/L, except 4-NP for which concentrations were 1.8, 5.5, 16.5 and 50 µg/L. Endpoint data (daily or given study day (SD)) collected included: mortality (daily), developmental stage (SD 7 and 21), hind limb length (HLL) (SD 7 and 21), snout-vent length (SVL) (SD 7 and 21), wet body weight (BW) (SD 7 and 21), and thyroid histopathology (SD 21). 4-OP and BBP caused accelerated development compared to controls at the mean measured concentration of 39.8 and 3.5 µg/L, respectively. Normalized HLL was increased on SD 21 for all chemicals except 4-NP. Histopathology revealed mild thyroid follicular cell hypertrophy at all BBP concentrations, while moderate thyroid follicular cell hypertrophy occurred at the 105 µg /L BBP concentration. Evidence of accelerated metamorphic development was also observed histopathologically in BBP-treated frogs at concentrations as low as 3.5 µg/L. Increased BW relative to control occurred for all chemicals except 4-OP. Increase in SVL was observed in larvae exposed to 4-NP, BBP and DBP on SD 21. With the exception of 4-NP, four of the chemicals tested appeared to alter thyroid axis-driven metamorphosis, albeit through different lines of evidence, with BBP and DBP providing the strongest evidence of effects on the thyroid axis. Citation information for this dataset can be found in Data.gov's References section.
Dataset from BIBR 1048 Dose Range Finding Study in Prevention of Venous...
data.niaid.nih.gov
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boehringer Ingelheim; Boehringer Ingelheim (2025). Dataset from BIBR 1048 Dose Range Finding Study in Prevention of Venous Thromboembolism in Patients With Primary Elective Total Hip or Knee Replacement Surgery [Dataset]. http://doi.org/10.25934/00003626
Explore at:
Unique identifier
https://doi.org/10.25934/00003626
Dataset updated
Feb 22, 2025
Dataset provided by
Boehringer Ingelheimhttp://boehringer-ingelheim.com/
Authors
Boehringer Ingelheim; Boehringer Ingelheim
Area covered
Italy, Finland, Netherlands, Sweden, Czech Republic, South Africa, Denmark, Belgium, France, Hungary
Variables measured
Bleeding, Transfusion, Blood Disorder, Laboratory Test, Pulmonary Embolism, Thromboembolic Event, Deep Venous Thrombosis, Area Under the Curve (AUC), Maximum Concentration (Cmax )
Description
The primary objective of this study is to establish the dose-response relationship with regard to efficacy and safety of BIBR 1048 (50 mg bis in die(b.i.d), 150 mg b.i.d, 225 mg b.i.d. and 300 mg quaque die(q.d) ) in preventing venous thromboembolism(VTE) in patients undergoing primary elective total hip and knee replacement.
Dataset from An Open-Label, Dose Escalation Study to Assess the Safety,...
data.niaid.nih.gov
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Biogen; Medical Director (2025). Dataset from An Open-Label, Dose Escalation Study to Assess the Safety, Tolerability and Dose-Range Finding of Multiple Doses of ISIS 396443 Delivered Intrathecally to Patients With Spinal Muscular Atrophy [Dataset]. http://doi.org/10.25934/00002895
Explore at:
Unique identifier
https://doi.org/10.25934/00002895
Dataset updated
Apr 28, 2025
Dataset provided by
Biogenhttp://biogen.com/
Authors
Biogen; Medical Director
Area covered
United States
Variables measured
Adverse Event, Pharmacokinetics, Serious Adverse Event, Area Under the Curve (AUC), Maximum Concentration (Cmax ), Withdrawal Due To Adverse Events, Drug Concentration (pharmacokinetic), Time to Maximum Concentration (Tmax)
Description
This study will test the safety, tolerability, and pharmacokinetics of escalating doses of nusinersen (ISIS 396443) administered into the spinal fluid either two or three times over the duration of the trial, in participants with spinal muscular atrophy (SMA). Four dose levels will be evaluated sequentially. Each dose level will be studied in a cohort of approximately 8 participants, where all participants will receive active drug.
Amazon AWS Recon Data For Finding Origin IP - 93M
kaggle.com
zip
Updated Sep 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chirag Artani (2023). Amazon AWS Recon Data For Finding Origin IP - 93M [Dataset]. https://www.kaggle.com/datasets/chiragartani/amazon-aws-asn-cidr-ip-to-hostname-recon-data
Explore at:
zip(225734462 bytes)Available download formats
Dataset updated
Sep 17, 2023
Authors
Chirag Artani
Description
Our mission with this project is to provide an always up-to-date and freely accessible map of the cloud landscape for every major cloud service provider.

We've decided to kick things off with collecting SSL certificate data of AWS EC2 machines, considering the value of this data to security researchers. However, we plan to expand the project to include more data and providers in the near future. Your input and suggestions are incredibly valuable to us, so please don't hesitate to reach out on Twitter or Discord and let us know what areas you think we should prioritize next!

How to find origin IP of any domain or subdomain inside this database?

You can find origin IP for an example: instacart.com, Just search there instacart.com

You can use command as well if you are using linux. Open the dataset using curl or wget and then **cd ** folder now run command: find . -type f -iname "*.csv" -print0 | xargs -0 grep "word"

Like: find . -type f -iname "*.csv" -print0 | xargs -0 grep "instacart.com"

Done, You will see output.

How can SSL certificate data benefit you? The SSL data is organized into CSV files, with the following properties collected for every found certificate:

IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed (boolean)

IP Address Common Name Organization Country Locality Province Subject Alternative DNS Name Subject Alternative IP address Self-signed 1.2.3.4 example.com Example, Inc. US San Francisco California example.com 1.2.3.4 false 5.6.7.8 acme.net Acme, Inc. US Seattle Washington *.acme.net 5.6.7.8 false So what can you do with this data?

Enumerate subdomains of your target domains Search for your target's domain names (e.g. example.com) and find hits in the Common Name and Subject Alternative Name fields of the collected certificates. All IP ranges are scanned daily and the dataset gets updated accordingly so you are very likley to find ephemeral hosts before they are taken down.

Enumerate domains of your target companies Search for your target's company name (e.g. Example, Inc.), find hits in the Organization field, and explore the associated Common Name and Subject Alternative Name fields. The results will probably include subdomains of the domains you're familiar with and if you're in luck you might find new root domains expanding the scope.

Enumerate possible sub-subdomain enumeration target If the certificate is issued for a wildcard (e.g. *.foo.example.com), chances are there are other subdomains you can find by brute-forcing there. And you know how effective of this technique can be. Here are some wordlists to help you with that!

💡 Note: Remeber to monitor the dataset for daily updates to get notified whenever a new asset comes up!

Perform IP lookups Search for an IP address (e.g. 3.122.37.147) to find host names associated with it, and explore the Common Name, Subject Alternative Name, and Organization fields to gain find more information about that address.

Discover origin IP addresses to bypass proxy services When a website is hidden behind security proxy services like Cloudflare, Akamai, Incapsula, and others, it is possible to search for the host name (e.g., example.com) in the dataset. This search may uncover the origin IP address, allowing you to bypass the proxy. We've discussed a similar technique on our blog which you can find here!

Get a fresh dataset of live web servers Each IP address in the dataset corresponds to an HTTPS server running on port 443. You can use this data for large-scale research without needing to spend time collecting it yourself.

Whatever else you can think of If you use this data for a cool project or research, we would love to hear about it!

Additionally, below you will find a detailed explanation of our data collection process and how you can implement the same technique to gather information from your own IP ranges.

TB; DZ (Too big; didn't zoom):

We kick off the workflow with a simple bash script that retrieves AWS's IP ranges. Using a JQ query, we extract the IP ranges of EC2 machines by filtering for .prefixes[] | select(.service=="EC2") | .ip_prefix. Other services are excluded from this workflow since they don't support custom SSL certificates, making their data irrelevant for our dataset.

Then, we use mapcidr to divide the IP ranges obtained in step 1 into smaller ranges, each containing up to 100k hosts (Thanks, ProjectDiscovery team!). This step will be handy in the next step when we run the parallel scanning process.

At the time of writing, the EC2 IP ranges include over 57 million IP addresses, so scanning them all on a single machine would be impractical, which is where our file-splitter node comes into play.

This node iterates through the input from mapcidr and triggers individual jobs for each range. When executing this w...
e
Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6
data.europa.eu
ckan.publishing.service.gov.uk
+2more
html, sparql
Updated Oct 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Housing, Communities and Local Government (2021). Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6 [Dataset]. https://data.europa.eu/data/datasets/subjective-wellbeing-worthwhile-percentage-of-responses-in-range-0-6
Explore at:
html, sparqlAvailable download formats
Dataset updated
Oct 11, 2021
Dataset authored and provided by
Ministry of Housing, Communities and Local Government
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.

The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

Overall, how satisfied are you with your life nowadays?

Overall, to what extent do you feel the things you do in your life are worthwhile?

Overall, how happy did you feel yesterday?

Overall, how anxious did you feel yesterday?

This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.

This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

The original data is available from the ONS website.

Detailed information on the APS and the Subjective Wellbeing dataset is available here.

As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.
Z
Dataset of "Lessons Learned from Using Cyber Range to Teach Cybersecurity at...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mlynek, Petr; Paušová, Šárka; Bouzek, Karel (2025). Dataset of "Lessons Learned from Using Cyber Range to Teach Cybersecurity at Different Levels of Education" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_15100440
Explore at:
Dataset updated
Mar 28, 2025
Dataset provided by
University of Chemistry and Technology
Brno University of Technology
Authors
Mlynek, Petr; Paušová, Šárka; Bouzek, Karel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In today’s modern society, it is difficult, nearly impossible, to work and study effectivelywithout using the internet. With services moving into cyberspace and theever-increasing number of users, new cyber threats are emerging with the potentialto cause devastation to both organizations and individuals. For this reason,it is necessary to educate users regardless of their age, gender, and qualification.This paper addresses the challenges associated with the need for cybersecurityeducation and presents lessons learned from applying an interactive and gamifiedapproach within a cyber range (CR), a controlled environment that enables thedeployment of virtual machines and networks for research, training, and testingpurposes. In our work, we utilized the CR platform to teach cybersecurity at theprimary, secondary, and high school levels of education. Through a series of tests,different approaches, surveys, and feedback collected from students and teachers,we identified their perceptions and critical aspects of CR-based cybersecurity education.We found that gamification positively influences learning, with studentsemphasizing the fun aspect and teachers highlighting engagement and motivation.Both groups value interactivity for developing practical skills and reinforcing theoreticalconcepts. Although scoring encourages competition, some students find itstressful. Similarly, penalizing hints can motivate problem solving, but may alsodeter those needing assistance. These and other findings presented in this papermay be useful for building and further developing cyber ranges to improve theeffectiveness of teaching, learning and training cybersecurity.
GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 -...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/glas-icesat-l1b-global-waveform-based-range-corrections-data-hdf5-v034
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
GLAH05 Level-1B waveform parameterization data include output parameters from the waveform characterization procedure and other parameters required to calculate surface slope and relief characteristics. GLAH05 contains parameterizations of both the transmitted and received pulses and other characteristics from which elevation and footprint-scale roughness and slope are calculated. The received pulse characterization uses two implementations of the retracking algorithms: one tuned for ice sheets, called the standard parameterization, used to calculate surface elevation for ice sheets, oceans, and sea ice; and another for land (the alternative parameterization). Each data granule has an associated browse product.
Rescaled Fashion-MNIST dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15187793
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled Fashion-MNIST dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
U
MonkeyPox2022Tweets: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on...
dataverse-staging.rdmc.unc.edu
txt
Updated Nov 21, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nirmalya Thakur; Nirmalya Thakur (2022). MonkeyPox2022Tweets: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on the 2022 Monkeypox Outbreak, Findings from Analysis of Tweets, and Open Research Questions [Dataset]. http://doi.org/10.15139/S3/J2O61P
Explore at:
txt(259159), txt(371803), txt(292444), txt(414076), txt(2912925), txt(981076), txt(369283), txt(1973137), txt(1067468), txt(2223684), txt(323482), txt(819880)Available download formats
Unique identifier
https://doi.org/10.15139/S3/J2O61P
Dataset updated
Nov 21, 2022
Dataset provided by
UNC Dataverse
Authors
Nirmalya Thakur; Nirmalya Thakur
License
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15139/S3/J2O61Phttps://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15139/S3/J2O61P
Description
Please cite the following paper when using this dataset: N. Thakur, “MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions,” Infect. Dis. Rep., vol. 14, no. 6, pp. 855–883, 2022, DOI: https://doi.org/10.3390/idr14060087. Abstract The mining of Tweets to develop datasets on recent issues, global challenges, pandemics, virus outbreaks, emerging technologies, and trending matters has been of significant interest to the scientific community in the recent past, as such datasets serve as a rich data resource for the investigation of different research questions. Furthermore, the virus outbreaks of the past, such as COVID-19, Ebola, Zika virus, and flu, just to name a few, were associated with various works related to the analysis of the multimodal components of Tweets to infer the different characteristics of conversations on Twitter related to these respective outbreaks. The ongoing outbreak of the monkeypox virus, declared a Global Public Health Emergency (GPHE) by the World Health Organization (WHO), has resulted in a surge of conversations about this outbreak on Twitter, which is resulting in the generation of tremendous amounts of Big Data. There has been no prior work in this field thus far that has focused on mining such conversations to develop a Twitter dataset. Therefore, this work presents an open-access dataset of 571,831 Tweets about monkeypox that have been posted on Twitter since the first detected case of this outbreak on May 7, 2022. The dataset complies with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter, as well as with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management. Data Description The dataset consists of a total of 571,831 Tweet IDs of the same number of tweets about monkeypox that were posted on Twitter from 7th May 2022 to 11th November (the most recent date at the time of uploading the most recent version of the dataset). The Tweet IDs are presented in 12 different .txt files based on the timelines of the associated tweets. The following represents the details of these dataset files. Filename: TweetIDs_Part1.txt (No. of Tweet IDs: 13926, Date Range of the associated Tweet IDs: May 7, 2022, to May 21, 2022) Filename: TweetIDs_Part2.txt (No. of Tweet IDs: 17705, Date Range of the associated Tweet IDs: May 21, 2022, to May 27, 2022) Filename: TweetIDs_Part3.txt (No. of Tweet IDs: 17585, Date Range of the associated Tweet IDs: May 27, 2022, to June 5, 2022) Filename: TweetIDs_Part4.txt (No. of Tweet IDs: 19718, Date Range of the associated Tweet IDs: June 5, 2022, to June 11, 2022) Filename: TweetIDs_Part5.txt (No. of Tweet IDs: 46718, Date Range of the associated Tweet IDs: June 12, 2022, to June 30, 2022) Filename: TweetIDs_Part6.txt (No. of Tweet IDs: 138711, Date Range of the associated Tweet IDs: July 1, 2022, to July 23, 2022) Filename: TweetIDs_Part7.txt (No. of Tweet IDs: 105890, Date Range of the associated Tweet IDs: July 24, 2022, to July 31, 2022) Filename: TweetIDs_Part8.txt (No. of Tweet IDs: 93959, Date Range of the associated Tweet IDs: August 1, 2022, to August 9, 2022) Filename: TweetIDs_Part9.txt (No. of Tweet IDs: 50832, Date Range of the associated Tweet IDs: August 10, 2022, to August 24, 2022) Filename: TweetIDs_Part10.txt (No. of Tweet IDs: 39042, Date Range of the associated Tweet IDs: August 25, 2022, to September 19, 2022) Filename: TweetIDs_Part11.txt (No. of Tweet IDs: 12341, Date Range of the associated Tweet IDs: September 20, 2022, to October 9, 2022) Filename: TweetIDs_Part12.txt (No. of Tweet IDs: 15404, Date Range of the associated Tweet IDs: October 10, 2022, to November 11, 2022) Please note: The dataset contains only Tweet IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset, the Hydrator application (link to download the application: https://github.com/DocNow/hydrator/releases and link to a step-by-step tutorial: https://towardsdatascience.com/learn-how-to-easily-hydrate-tweets-a0f393ed340e#:~:text=Hydrating%20Tweets) may be used.
Rescaled CIFAR-10 dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled CIFAR-10 dataset [Dataset]. http://doi.org/10.5281/zenodo.15188748
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15188748
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Description
Motivation

The goal of introducing the Rescaled CIFAR-10 dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled CIFAR-10 dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled CIFAR-10 dataset contains substantially more natural textures and patterns than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2

and is therefore significantly more challenging.

Access and rights

The Rescaled CIFAR-10 dataset is provided on the condition that you provide proper citation for the original CIFAR-10 dataset:

[4] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., University of Toronto.

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled CIFAR-10 dataset is generated by rescaling 32×32 RGB images of animals and vehicles from the original CIFAR-10 dataset [4]. The scale variations are up to a factor of 4. In order to have all test images have the same resolution, mirror extension is used to extend the images to size 64x64. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 distinct classes in the dataset: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 40 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 40 000 samples from the original CIFAR-10 training set. The validation dataset, on the other hand, is formed from the final 10 000 image batch of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original CIFAR-10 test set.

The h5 files containing the dataset

The training dataset file (~5.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

cifar10_with_scale_variations_tr40000_vl10000_te10000_outsize64-64_scte1p000_scte1p000.h5

Additionally, for the Rescaled CIFAR-10 dataset, there are 9 datasets (~1 GB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

cifar10_with_scale_variations_te10000_outsize64-64_scte0p500.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p595.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p707.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p841.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p000.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p189.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p414.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p682.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte2p000.h5

These dataset files were used for the experiments presented in Figures 9, 10, 15, 16, 20 and 24 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
Southern Long-Toed Salamander Range - CWHR A003B [ds2844]
data.ca.gov
data.cnra.ca.gov
+5more
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Southern Long-Toed Salamander Range - CWHR A003B [ds2844] [Dataset]. https://data.ca.gov/dataset/southern-long-toed-salamander-range-cwhr-a003b-ds2844
Explore at:
geojson, zip, arcgis geoservices rest api, kml, csv, html, ashxAvailable download formats
Dataset updated
Oct 27, 2025
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CWHR species range datasets represent the maximum current geographic extent of each species within California. Ranges were originally delineated at a scale of 1:5,000,000 by species-level experts more than 30 years ago and have gradually been revised at a scale of 1:1,000,000. Species occurrence data are used in defining species ranges, but range polygons may extend beyond the limits of extant occurrence data for a particular species. When drawing range boundaries, CDFW seeks to err on the side of commission rather than omission. This means that CDFW may include areas within a range based on expert knowledge or other available information, despite an absence of confirmed occurrences, which may be due to a lack of survey effort. The degree to which a range polygon is extended beyond occurrence data will vary among species, depending upon each species’ vagility, dispersal patterns, and other ecological and life history factors. The boundary line of a range polygon is drawn with consideration of these factors and is aligned with standardized boundaries including watersheds (NHD), ecoregions (USDA), or other ecologically meaningful delineations such as elevation contour lines. While CWHR ranges are meant to represent the current range, once an area has been designated as part of a species’ range in CWHR, it will remain part of the range even if there have been no documented occurrences within recent decades. An area is not removed from the range polygon unless experts indicate that it has not been occupied for a number of years after repeated surveys or is deemed no longer suitable and unlikely to be recolonized. It is important to note that range polygons typically contain areas in which a species is not expected to be found due to the patchy configuration of suitable habitat within a species’ range. In this regard, range polygons are coarse generalizations of where a species may be found. This data is available for download from the CDFW website: https://www.wildlife.ca.gov/Data/CWHR.
The following data sources were collated for the purposes of range mapping and species habitat modeling by RADMAP. Each focal taxon’s location data was extracted (when applicable) from the following list of sources. BIOS datasets are bracketed with their “ds” numbers and can be located on CDFW’s BIOS viewer: https://wildlife.ca.gov/Data/BIOS.
California Natural Diversity Database,
Terrestrial Species Monitoring [ds2826],
North American Bat Monitoring Data Portal,
VertNet,
Breeding Bird Survey,
Wildlife Insights,
eBird,
iNaturalist,
other available CDFW or partner data.
Z
Wrist-mounted IMU data towards the investigation of free-living human eating...
data.niaid.nih.gov
Updated Jun 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyritsis, Konstantinos; Diou, Christos; Delopoulos, Anastasios (2022). Wrist-mounted IMU data towards the investigation of free-living human eating behavior - the Free-living Food Intake Cycle (FreeFIC) dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4420038
Explore at:
Dataset updated
Jun 20, 2022
Dataset provided by
Aristotle University of Thessaloniki
Harokopio University of Athens
Authors
Kyritsis, Konstantinos; Diou, Christos; Delopoulos, Anastasios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Introduction

The Free-living Food Intake Cycle (FreeFIC) dataset was created by the Multimedia Understanding Group towards the investigation of in-the-wild eating behavior. This is achieved by recording the subjects’ meals as a small part part of their everyday life, unscripted, activities. The FreeFIC dataset contains the (3D) acceleration and orientation velocity signals ((6) DoF) from (22) in-the-wild sessions provided by (12) unique subjects. All sessions were recorded using a commercial smartwatch ((6) using the Huawei Watch 2™ and the MobVoi TicWatch™ for the rest) while the participants performed their everyday activities. In addition, FreeFIC also contains the start and end moments of each meal session as reported by the participants.

Description

FreeFIC includes (22) in-the-wild sessions that belong to (12) unique subjects. Participants were instructed to wear the smartwatch to the hand of their preference well ahead before any meal and continue to wear it throughout the day until the battery is depleted. In addition, we followed a self-report labeling model, meaning that the ground truth is provided from the participant by documenting the start and end moments of their meals to the best of their abilities as well as the hand they wear the smartwatch on. The total duration of the (22) recordings sums up to (112.71) hours, with a mean duration of (5.12) hours. Additional data statistics can be obtained by executing the provided python script stats_dataset.py. Furthermore, the accompanying python script viz_dataset.py will visualize the IMU signals and ground truth intervals for each of the recordings. Information on how to execute the Python scripts can be found below.

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

$ python stats_dataset.py

Visualize signals and ground truth

$ python viz_dataset.py

FreeFIC is also tightly related to Food Intake Cycle (FIC), a dataset we created in order to investigate the in-meal eating behavior. More information about FIC can be found here and here.

Publications

If you plan to use the FreeFIC dataset or any of the resources found in this page, please cite our work:

@article{kyritsis2020data,
title={A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches},
author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios},
journal={IEEE Journal of Biomedical and Health Informatics}, year={2020},
publisher={IEEE}}

@inproceedings{kyritsis2017automated, title={Detecting Meals In the Wild Using the Inertial Data of a Typical Smartwatch}, author={Kyritsis, Konstantinos and Diou, Christos and Delopoulos, Anastasios}, booktitle={2019 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},
year={2019}, organization={IEEE}}

Technical details

We provide the FreeFIC dataset as a pickle. The file can be loaded using Python in the following way:

import pickle as pkl import numpy as np

with open('./FreeFIC_FreeFIC-heldout.pkl','rb') as fh: dataset = pkl.load(fh)

The dataset variable in the snipet above is a dictionary with (5) keys. Namely:

'subject_id'

'session_id'

'signals_raw'

'signals_proc'

'meal_gt'

The contents under a specific key can be obtained by:

sub = dataset['subject_id'] # for the subject id ses = dataset['session_id'] # for the session id raw = dataset['signals_raw'] # for the raw IMU signals proc = dataset['signals_proc'] # for the processed IMU signals gt = dataset['meal_gt'] # for the meal ground truth

The sub, ses, raw, proc and gt variables in the snipet above are lists with a length equal to (22). Elements across all lists are aligned; e.g., the (3)rd element of the list under the 'session_id' key corresponds to the (3)rd element of the list under the 'signals_proc' key.

sub: list Each element of the sub list is a scalar (integer) that corresponds to the unique identifier of the subject that can take the following values: ([1, 2, 3, 4, 13, 14, 15, 16, 17, 18, 19, 20]). It should be emphasized that the subjects with ids (15, 16, 17, 18, 19) and (20) belong to the held-out part of the FreeFIC dataset (more information can be found in ( )the publication titled "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al). Moreover, the subject identifier in FreeFIC is in-line with the subject identifier in the FIC dataset (more info here and here); i.e., FIC’s subject with id equal to (2) is the same person as FreeFIC’s subject with id equal to (2).

ses: list Each element of this list is a scalar (integer) that corresponds to the unique identifier of the session that can range between (1) and (5). It should be noted that not all subjects have the same number of sessions.

raw: list Each element of this list is dictionary with the 'acc' and 'gyr' keys. The data under the 'acc' key is a (N_{acc} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw accelerometer measurements in (g) (second, third and forth columns - representing the (x, y ) and (z) axis, respectively). The data under the 'gyr' key is a (N_{gyr} \times 4) numpy.ndarray that contains the timestamps in seconds (first column) and the (3D) raw gyroscope measurements in ({degrees}/{second})(second, third and forth columns - representing the (x, y ) and (z) axis, respectively). All sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). Finally, the length of the raw accelerometer and gyroscope numpy.ndarrays is different ((N_{acc} eq N_{gyr})). This behavior is predictable and is caused by the Android platform.

proc: list Each element of this list is an (M\times7) numpy.ndarray that contains the timestamps, (3D) accelerometer and gyroscope measurements for each meal. Specifically, the first column contains the timestamps in seconds, the second, third and forth columns contain the (x,y) and (z) accelerometer values in (g) and the fifth, sixth and seventh columns contain the (x,y) and (z) gyroscope values in ({degrees}/{second}). Unlike elements in the raw list, processed measurements (in the proc list) have a constant sampling rate of (100) Hz and the accelerometer/gyroscope measurements are aligned with each other. In addition, all sensor streams are transformed in such a way that reflects all participants wearing the smartwatch at the same hand with the same orientation, thusly achieving data uniformity. This transformation is in par with the signals in the FIC dataset (more info here and here). No other preprocessing is performed on the data; e.g., the acceleration component due to the Earth's gravitational field is present at the processed acceleration measurements. The potential researcher can consult the article "A Data Driven End-to-end Approach for In-the-wild Monitoring of Eating Behavior Using Smartwatches" by Kyritsis et al. on how to further preprocess the IMU signals (i.e., smooth and remove the gravitational component).

meal_gt: list Each element of this list is a (K\times2) matrix. Each row represents the meal intervals for the specific in-the-wild session. The first column contains the timestamps of the meal start moments whereas the second one the timestamps of the meal end moments. All timestamps are in seconds. The number of meals (K) varies across recordings (e.g., a recording exist where a participant consumed two meals).

Ethics and funding

Informed consent, including permission for third-party access to anonymised data, was obtained from all subjects prior to their engagement in the study. The work has received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No 727688 - BigO: Big data against childhood obesity.

Contact

Any inquiries regarding the FreeFIC dataset should be addressed to:

Dr. Konstantinos KYRITSIS

Multimedia Understanding Group (MUG) Department of Electrical & Computer Engineering Aristotle University of Thessaloniki University Campus, Building C, 3rd floor Thessaloniki, Greece, GR54124

Tel: +30 2310 996359, 996365 Fax: +30 2310 996398 E-mail: kokirits [at] mug [dot] ee [dot] auth [dot] gr
N
Median Household Income Variation by Family Size in Grass Range, MT:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Grass Range, MT: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1af70bab-73fd-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Montana, Grass Range
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Grass Range, MT, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, only 2-person households were found in Grass Range. The coefficient of variation (CV) is 37.75%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 2-person households, with an income of $77,015. Additionally, the Census Bureau did not report a median household income for larger household sizes.

https://i.neilsberg.com/ch/grass-range-mt-median-household-income-by-household-size.jpeg" alt="Grass Range, MT median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Grass Range median household income. You can refer the same here
housing
kaggle.com
zip
Updated Sep 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
Explore at:
zip(809785 bytes)Available download formats
Dataset updated
Sep 22, 2023
Authors
HappyRautela
Description
The exercise after this contains questions that are based on the housing dataset.

How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
B
Data from: A comprehensive analysis of autocorrelation and bias in home...
datasetcatalog.nlm.nih.gov
borealisdata.ca
+1more
Updated May 19, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René (2021). Data from: A comprehensive analysis of autocorrelation and bias in home range estimation [Dataset]. http://doi.org/10.5683/SP2/OAJTAO
Explore at:
Unique identifier
https://doi.org/10.5683/SP2/OAJTAO
Dataset updated
May 19, 2021
Authors
Schabo, Dana G.; Ullmann, Wiebke; de Paula Cunha, Rogerio; Markham, A. Catherine; Alberts, Susan C.; Selva, Nuria; Koch, Flávia; Ali, Abdullahi H.; Zwijacz-Kozica, Tomasz; Thompson, Peter; Sergiel, Agnieszka; Mueller, Thomas; Dekker, Jasja; Ramalho, Emiliano E.; Patterson, Bruce D.; Morato, Ronaldo G.; Farwig, Nina; da Silva, Marina X.; LaPoint, Scott; Beyer, Dean; Medici, Emilia Patricia; Goheen, Jacob R.; Noonan, Michael J.; Olson, Kirk A.; Jeltsch, Florian; Belant, Jerrold L.; Fichtel, Claudia; Fleming, Christen H.; Akre, Tom S.; Ford, Adam T.; Nathan, Ran; Böhning-Gaese, Katrin; Fagan, William F.; Blaum, Niels; Tucker, Marlee A.; Antunes, Pamela C.; Drescher-Lehman, Jonathan; Rosner, Sascha; Calabrese, Justin M.; Paviolo, Agustin; Cullen Jr. , Laury; Fischer, Christina; Spiegel, Orr; Altmann, Jeanne; Zięba, Filip; Oliveira-Santos, Luiz Gustavo R.; Kappeler, Peter M.; Kauffman, Matthew; Janssen, René
Description
AbstractHome range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive dataset of GPS locations from 369 individuals representing 27 species distributed across 5 continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated-Gaussian reference function (AKDE), Silverman's rule of thumb, and least squares cross-validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators except AKDE assume independent and identically distributed (IID) data. We then employ half-sample cross-validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation ($\hat{N}_\mathrm{area}$) to quantify the information content of each dataset. We found that AKDE 95\% area estimates were larger than conventional IID-based estimates by a mean factor of 2. The median number of cross-validated locations included in the holdout sets by AKDE 95\% (or 50\%) estimates was 95.3\% (or 50.1\%), confirming the larger AKDE ranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing $\hat{N}_\mathrm{area}$. To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated that AKDE was generally more accurate than conventional methods, particularly for small $\hat{N}_\mathrm{area}$. While 72\% of the 369 empirical datasets had \textgreater1000 total observations, only 4\% had an $\hat{N}_\mathrm{area}$ \textgreater1000, where 30\% had an $\hat{N}_\mathrm{area}$ \textless30. In this frequently encountered scenario of small $\hat{N}_\mathrm{area}$, AKDE was the only estimator capable of producing an accurate home range estimate on autocorrelated data.
f
Aligning marine species range data to better serve science and conservation
plos.figshare.com
tiff
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casey C. O'Hara; Jamie C. Afflerbach; Courtney Scarborough; Kristin Kaschner; Benjamin S. Halpern (2023). Aligning marine species range data to better serve science and conservation [Dataset]. http://doi.org/10.1371/journal.pone.0175739
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0175739
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Casey C. O'Hara; Jamie C. Afflerbach; Courtney Scarborough; Kristin Kaschner; Benjamin S. Halpern
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Species distribution data provide the foundation for a wide range of ecological research studies and conservation management decisions. Two major efforts to provide marine species distributions at a global scale are the International Union for Conservation of Nature (IUCN), which provides expert-generated range maps that outline the complete extent of a species' distribution; and AquaMaps, which provides model-generated species distribution maps that predict areas occupied by the species. Together these databases represent 24,586 species (93.1% within AquaMaps, 16.4% within IUCN), with only 2,330 shared species. Differences in intent and methodology can result in very different predictions of species distributions, which bear important implications for scientists and decision makers who rely upon these datasets when conducting research or informing conservation policy and management actions. Comparing distributions for the small subset of species with maps in both datasets, we found that AquaMaps and IUCN range maps show strong agreement for many well-studied species, but our analysis highlights several key examples in which introduced errors drive differences in predicted species ranges. In particular, we find that IUCN maps greatly overpredict coral presence into unsuitably deep waters, and we show that some AquaMaps computer-generated default maps (only 5.7% of which have been reviewed by experts) can produce odd discontinuities at the extremes of a species’ predicted range. We illustrate the scientific and management implications of these tradeoffs by repeating a global analysis of gaps in coverage of marine protected areas, and find significantly different results depending on how the two datasets are used. By highlighting tradeoffs between the two datasets, we hope to encourage increased collaboration between taxa experts and large scale species distribution modeling efforts to further improve these foundational datasets, helping to better inform science and policy recommendations around understanding, managing, and protecting marine biodiversity.
Path loss at 5G high frequency range in South Asia
kaggle.com
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M MEHEDI ZAMAN
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
South Asia, Asia
Description
This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

Citation

If you use this dataset, please cite the following paper:

Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }

Facebook

Twitter

Click to copy link

Link copied

Cite

Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865

ANN development + final testing datasets

Explore at:

Dataset updated

Jan 24, 2020

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

File name definitions:

'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

'...v_175_250...' - dataset for velocity range [175, 250] m/s

'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

input values in 'IN' sheet

target values in 'TARGET' sheet

Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

Check reference below (to be added when the paper is published)

https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams

Clear search

Close search

Google apps

Main menu

ANN development + final testing datasets

Fused Image dataset for convolutional neural Network-based crack Detection...

Credit Card Eligibility Data: Determining Factors

Amphibian metamorphosis assays- biological & histopathological data and...

Dataset from BIBR 1048 Dose Range Finding Study in Prevention of Venous...

Dataset from An Open-Label, Dose Escalation Study to Assess the Safety,...

Amazon AWS Recon Data For Finding Origin IP - 93M

How to find origin IP of any domain or subdomain inside this database?

Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6

Dataset of "Lessons Learned from Using Cyber Range to Teach Cybersecurity at...

GLAS/ICESat L1B Global Waveform-based Range Corrections Data (HDF5) V034 -...

Rescaled Fashion-MNIST dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

MonkeyPox2022Tweets: MonkeyPox2022Tweets: A Large-Scale Twitter Dataset on...

Rescaled CIFAR-10 dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Southern Long-Toed Salamander Range - CWHR A003B [ds2844]

Wrist-mounted IMU data towards the investigation of free-living human eating...

The script(s) and the pickle file must be located in the same directory.

Tested with Python 3.6.4

Requirements: Numpy, Pickle and Matplotlib

Calculate and echo dataset statistics

Visualize signals and ground truth

Median Household Income Variation by Family Size in Grass Range, MT:...

About this dataset

Content

Inspiration

Recommended for further research

housing

Data from: A comprehensive analysis of autocorrelation and bias in home...

Aligning marine species range data to better serve science and conservation

Path loss at 5G high frequency range in South Asia

Citation

ANN development + final testing datasets