100+ datasets found

Path loss at 5G high frequency range in South Asia
kaggle.com
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 25, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
S M MEHEDI ZAMAN
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
South Asia, Asia
Description
This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

Citation

If you use this dataset, please cite the following paper:

Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }
N
South Range, MI Population Breakdown by Gender and Age Dataset: Male and...
neilsberg.com
csv, json
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). South Range, MI Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e200fba9-f25d-11ef-8c1b-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.

Key observations

Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Scope of gender :

Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

Variables / Data Columns

Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.

Population (Male): The male population in the South Range is shown in the following column.

Population (Female): The female population in the South Range is shown in the following column.

Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in South Range for each age group.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
e
Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6
data.europa.eu
ckan.publishing.service.gov.uk
+2more
html, sparql
Updated Oct 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ministry of Housing, Communities and Local Government (2021). Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6 [Dataset]. https://data.europa.eu/data/datasets/subjective-wellbeing-worthwhile-percentage-of-responses-in-range-0-6
Explore at:
html, sparqlAvailable download formats
Dataset updated
Oct 11, 2021
Dataset authored and provided by
Ministry of Housing, Communities and Local Government
License
http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Description
Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.

The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

Overall, how satisfied are you with your life nowadays?

Overall, to what extent do you feel the things you do in your life are worthwhile?

Overall, how happy did you feel yesterday?

Overall, how anxious did you feel yesterday?

This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.

This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

The original data is available from the ONS website.

Detailed information on the APS and the Subjective Wellbeing dataset is available here.

As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.
Point Cloud Mnist 2D
kaggle.com
zip
Updated Feb 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cristian Garcia (2020). Point Cloud Mnist 2D [Dataset]. https://www.kaggle.com/datasets/cristiangarcia/pointcloudmnist2d/discussion
Explore at:
zip(34176926 bytes)Available download formats
Dataset updated
Feb 12, 2020
Authors
Cristian Garcia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Point Cloud MNIST 2D

This is a simple dataset for getting started with Machine Learning for point cloud data. It take the original MNIST and converts each of the non-zero pixels into points in a 2D space. The idea is to classify each collection of point (rather than images) to the same label as in the MNIST. The source for generating this dataset can be found in this repository: cgarciae/point-cloud-mnist-2D

Format

There are 2 files: train.csv and test.csv. Each file has the columns

label,x0,y0,v0,x1,y1,v1,...,x350,y350,v350

where

label contains the target label in the range [0, 9]

x{i} contain the x position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].

y{i} contain the y position of the pixel/point as viewed in a Cartesian plane in the range [-1, 27].

v{i} contain the value of the pixel in the range [-1, 255].

Padding

The maximum number of point found on a image was 351, images with less points where padded to this length using the following values:

x{i} = -1

y{i} = -1

v{i} = -1

Subsamples

To make the challenge more interesting you can also try to solve the problem using a subset of points, e.g. the first N. Here are some visualizations of the dataset using different amounts of points:

50

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2Fbbf5393884480e3d24772344e079c898%2F50.png?generation=1579911143877077&alt=media" alt="50">

100

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5a83f6f5f7c5791e3c1c8e9eba2d052b%2F100.png?generation=1579911238988368&alt=media" alt="100">

200

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F202098ed0da35c41ae45dfc32e865972%2F200.png?generation=1579911264286372&alt=media" alt="200">

351

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F5c733566f8d689c5e0fd300440d04da2%2Fmax.png?generation=1579911289750248&alt=media" alt="">

Distribution

This histogram of the distribution the number of points per image in the dataset can give you a general idea of how difficult each variation can be.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F158444%2F9eb3b463f77a887dae83a7af0eb08c7d%2Flengths.png?generation=1579911380397412&alt=media" alt="">
Car Driving Distance Range Dataset
kaggle.com
zip
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Mahimkar (2022). Car Driving Distance Range Dataset [Dataset]. https://www.kaggle.com/datasets/adityamahimkar/car-driving-distance-range-dataset
Explore at:
zip(7622 bytes)Available download formats
Dataset updated
Jan 30, 2022
Authors
Aditya Mahimkar
Description
What’s in a tank of fuel?

The cars with the best (and worst) driving range in Australia is included in the dataset.

Content

The data is divided into two files: - PETROL.csv - DIESEL.csv

Both the datasets contain the same type of columns and one can combine the two by just adding the is_petrol_diesel column. Dataset Description is as follows: - MAKE: car company - MODEL: car model - TYPE: car type - CYL: number of cylinders - ENGINE L: engine capacity in Litres - FUEL TANK L: fuel tank capacity - CONS. L/100km: fuel consumption per 100 km RANGE km: the distance range of the car

Acknowledgements

The data is been collected from drive.com.au. A detailed and nice article has been published on site which can help while analyzing the data.
Datasets of Oxygen concentration measurements in ICU boxes at Faculty...
figshare.com
txt
Updated Oct 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Pies; Radovan Hajovsky; Radek Martinek; Katerina Barnova; Jan Velicka (2023). Datasets of Oxygen concentration measurements in ICU boxes at Faculty Hospital in Ostrava, Czech Republic [Dataset]. http://doi.org/10.6084/m9.figshare.24248842.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24248842.v1
Dataset updated
Oct 5, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Martin Pies; Radovan Hajovsky; Radek Martinek; Katerina Barnova; Jan Velicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ostrava, Czechia
Description
This dataset contains measured data from five sensor modules designed for monitoring the oxygen concentration in the air in a hospital environment, especially in rooms where oxygen therapy can potentially occurs. This data is crucial from a safety point of view, as a higher oxygen concentration can increase the risk of fire development.Sensor modules were placed at various locations of the Ostrava University Hospital in the Czech Republic. Sensor modules 1 to 4 were located in intensive care units (ICUs), while sensor module 5 was located in the nurses' office as a reference measurement point. The data was collected between January 28, 2021, and October 2023, providing a comprehensive data set from different seasons and periods.The dataset contains information on atmospheric oxygen concentration, including outage data (data gaps) caused by various factors such as sensor technical problems or maintenance. Importantly, erroneous measurements were identified and removed from the dataset without replacement, ensuring data quality and reliability.The dataset contains a summary record of all measured data (file iqrf_fno_o2_0x27f_dataset.csv), where the most important columns are:Ts – time stampNode – sensor module numberRSSI – signal strength of the sensor moduleTemperature – air temperature in the monitored roomO2 – oxygen concentration in the monitored roomVbatt – sensor module battery voltageThe other columns are irrelevant and are for debugging purposes only.For easier use of the records, datasets from individual sensor modules were also generated.The recording also includes an electronic scheme of the sensor module. More detailed information about the firmware for the module is available at the workplace of the author's collective.This dataset can potentially be used for air quality analysis in hospital environments and for monitoring oxygen concentration in oxygen therapy rooms. It can also serve as a basis for the development of predictive models or systems for automatic monitoring and warning of potentially dangerous situations associated with oxygen concentration in hospitals.
TIGER/Line Shapefile, 2023, County, Pinal County, AZ, Address Range-Feature...
catalog.data.gov
datasets.ai
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2025). TIGER/Line Shapefile, 2023, County, Pinal County, AZ, Address Range-Feature Name Relationship File [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-county-pinal-county-az-address-range-feature-name-relationship-file
Explore at:
Dataset updated
Aug 10, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
United States Department of Commercehttp://commerce.gov/
Area covered
Pinal County, Arizona
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Address Range / Feature Name Relationship File (ADDRFN.dbf) contains a record for each address range / linear feature name relationship. The purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute that can be used to link to the Address Ranges Relationship File (ADDR.dbf). The linear feature name is identified by the linear feature identifier (LINEARID) attribute that can be used to link to the Feature Names Relationship File (FEATNAMES.dbf).
TIGER/Line Shapefile, 2023, County, Grant County, WA, Address Range-Feature...
catalog.data.gov
s.cnmilf.com
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geospatial Products Branch (Point of Contact) (2025). TIGER/Line Shapefile, 2023, County, Grant County, WA, Address Range-Feature Name Relationship File [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2023-county-grant-county-wa-address-range-feature-name-relationship-file
Explore at:
Dataset updated
Aug 10, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Grant County, Washington
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national filewith no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independentdata set, or they can be combined to cover the entire nation. The Address Range / Feature Name Relationship File (ADDRFN.dbf) contains a record for each address range / linear feature name relationship. The purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute that can be used to link to the Address Ranges Relationship File (ADDR.dbf). The linear feature name is identified by the linear feature identifier (LINEARID) attribute that can be used to link to the Feature Names Relationship File (FEATNAMES.dbf).
Rescaled Fashion-MNIST dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15187793
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled Fashion-MNIST dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
TIGER/Line Shapefile, Current, County, Clinton County, NY, Address...
catalog.data.gov
gimi9.com
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Clinton County, NY, Address Range-Feature Name Relationship File [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-clinton-county-ny-address-range-feature-name-relationship-f
Explore at:
Dataset updated
Aug 8, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Clinton County, New York
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range/Feature Name Relationship File contains a record for each address range/linear feature name relationship. The purpose of this relationship file is to identify all street names associated with each address range. An edge can have several feature names; an address range located on an edge can be associated with one or any combination of the available feature names (an address range can be linked to multiple feature names). The address range is identified by the address range identifier (ARID) attribute that can be used to link to the Address Range Relationship File (addr.dbf). The linear feature name is identified by the linear feature identifier (LINEARID) attribute which can be used to link to the Feature Names Relationship File (featnames.dbf).
Crypto, Web3 and Blockchain Jobs
kaggle.com
zip
Updated Nov 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Crypto, Web3 and Blockchain Jobs [Dataset]. https://www.kaggle.com/datasets/thedevastator/the-evolving-blockchain-cryptocurrency-job-marke
Explore at:
zip(427314 bytes)Available download formats
Dataset updated
Nov 16, 2022
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Crypto, Web3 and Blockchain Jobs

Scraped active crypto jobs listed on cryptojobslist.com

About this dataset

Our crypto job market dataset contains data on job postings in the blockchain/cryptocurrency industry from 75 different websites. The data spans from January 1st, 2018 to December 31st, 2019.

This dataset provides a unique opportunity to understand the trends and dynamics of the burgeoningcrypto job market. It includes information on job postings from a wide range of companies, spanning startups to established enterprises. The data includes job titles, salary ranges, tags, and the date the job was posted.

This dataset can help answer important questions about the crypto job market, such as: - What types of jobs are most popular in the industry? - What skills are most in demand? - What are typical salaries for different positions?

How to use the dataset

The data in this dataset can be used to analyze the trends in the blockchain/cryptocurrency job market. The data includes information on job postings from 75 different websites, spanning from January 1st, 2018 to December 31st, 2019.

The data can be used to track the number of job postings over time, as well as the average salary for each position. Additionally, the tags column can be used to identify which skills are most in demand by employers

Research Ideas

Identify trends in the types of jobs being posted in the blockchain/cryptocurrency industry.

Study which companies are hiring the most in the blockchain/cryptocurrency industry.

Acknowledgements

The dataset was scraped from here, and here. And was originally posted here

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: companies.csv | Column name | Description | |:------------------------|:-------------------------------------------------------------| | Crunchbase Rank | The rank of the company on Crunchbase. (Integer) | | Company Name | The name of the company. (String) | | Total Funding | The total amount of funding the company has raised. (String) | | Number of Employees | The number of employees the company has. (Integer) |

File: all_jobs.csv | Column name | Description | |:------------------|:-------------------------------------------| | Company Name | The name of the company. (String) | | Job Link | A link to the job posting. (String) | | Job Location | The location of the job. (String) | | Job Title | The title of the job. (String) | | Salary Range | The salary range for the job. (String) | | Tags | The tags associated with the job. (String) | | Posted Before | The date the job was posted. (Date) |

File: Aave.csv | Column name | Description | |:-----------------|:-------------------------------------------| | Company Name | The name of the company. (String) | | Job Title | The title of the job. (String) | | Salary Range | The salary range for the job. (String) | | Tags | The tags associated with the job. (String) |

File: Alchemy.csv | Column name | Description | |:-----------------|:-------------------------------------------| | Company Name | The name of the company. (String) | | Job Title | The title of the job. (String) | | Salary Range | The salary range for the job. (String) | | Tags | The tags associated with the job. (String) |

File: Amun 21 Shares.csv | Column name | Description | |:-----------------|:-------------------------------------------| | Company Name | The name of the company. (String) | | Job Title | The title of the job. (String) | | Salary Range | The salary range for the job. (String) | | Tags | The tags associated with the job. (String) |

File: Anchorage Digital.csv | Column name | Description | |:-----------------|:-------------------------------------------| | **Company N...
Pre and Post-Exercise Heart Rate Analysis
kaggle.com
zip
Updated Sep 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah M Almutairi (2024). Pre and Post-Exercise Heart Rate Analysis [Dataset]. https://www.kaggle.com/datasets/abdullahmalmutairi/pre-and-post-exercise-heart-rate-analysis
Explore at:
zip(3857 bytes)Available download formats
Dataset updated
Sep 29, 2024
Authors
Abdullah M Almutairi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Overview:

This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.

The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.

File: Filename: heart_rate_data.csv File Format: CSV

- Features (Columns):

Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.

Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.

Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.

Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.

Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.

Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.

Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.

Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.

Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.

Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.

License: Choose an appropriate open license, such as:

CC BY 4.0 (Attribution 4.0 International).

Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?

Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
Rescaled CIFAR-10 dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled CIFAR-10 dataset [Dataset]. http://doi.org/10.5281/zenodo.15188748
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15188748
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Description
Motivation

The goal of introducing the Rescaled CIFAR-10 dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled CIFAR-10 dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled CIFAR-10 dataset contains substantially more natural textures and patterns than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2

and is therefore significantly more challenging.

Access and rights

The Rescaled CIFAR-10 dataset is provided on the condition that you provide proper citation for the original CIFAR-10 dataset:

[4] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., University of Toronto.

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled CIFAR-10 dataset is generated by rescaling 32×32 RGB images of animals and vehicles from the original CIFAR-10 dataset [4]. The scale variations are up to a factor of 4. In order to have all test images have the same resolution, mirror extension is used to extend the images to size 64x64. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 distinct classes in the dataset: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 40 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 40 000 samples from the original CIFAR-10 training set. The validation dataset, on the other hand, is formed from the final 10 000 image batch of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original CIFAR-10 test set.

The h5 files containing the dataset

The training dataset file (~5.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

cifar10_with_scale_variations_tr40000_vl10000_te10000_outsize64-64_scte1p000_scte1p000.h5

Additionally, for the Rescaled CIFAR-10 dataset, there are 9 datasets (~1 GB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

cifar10_with_scale_variations_te10000_outsize64-64_scte0p500.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p595.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p707.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p841.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p000.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p189.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p414.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p682.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte2p000.h5

These dataset files were used for the experiments presented in Figures 9, 10, 15, 16, 20 and 24 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
R
Finder Close Range Dataset
universe.roboflow.com
zip
Updated Oct 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MyObj (2024). Finder Close Range Dataset [Dataset]. https://universe.roboflow.com/myobj-1lyiw/finder-close-range
Explore at:
zipAvailable download formats
Dataset updated
Oct 21, 2024
Dataset authored and provided by
MyObj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Banana Bounding Boxes
Description
Finder Close Range

## Overview Finder Close Range is a dataset for object detection tasks - it contains Banana annotations for 252 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
z
mmWave-based Fitness Activity Recognition Dataset
zenodo.org
png, zip
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen (2024). mmWave-based Fitness Activity Recognition Dataset [Dataset]. http://doi.org/10.5281/zenodo.7793613
Explore at:
zip, pngAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7793613
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodo
Authors
Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen; Yucheng Xie; Xiaonan Guo; Yan Wang; Jerry Cheng; Yingying Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description:
This mmWave Datasets are used for fitness activity identification. This dataset (FA Dataset) contains 14 common fitness daily activities. The data are captured by the mmWave radar TI-AWR1642. The dataset can be used by fellow researchers to reproduce the original work or to further explore other machine-learning problems in the domain of mmWave signals.
Format: .png format
Section 1: Device Configuration
A commodity mmWave radar TI AWR1642, which integrates a 2 × 4 antenna array. The detailed information of it can be found at https://www.ti.com/product/AWR1642#:~:text=The%20AWR1642%20is%20an%20ideal,of%2076%20to%2081%20GHz.
A TI DCA1000EVM data capture card is used to collect data from the mmWave device and send data to a laptop. The detailed information can be found at https://www.ti.com/tool/DCA1000EVM?keyMatch=DCA1000EVM.
mmWave radar work at the frequency in the range of 77~81GHz. The sampling rate is fixed at 100 frames per second and each frame has 17 chirps.
Section 2: Data Format
We provide our mmWave data in heatmaps for this dataset. The data file is in the png format. The details are shown in the following:
14 activities are included in the FA Dataset.
2 participants are included in the FA Dataset.
FA_d_p_i_u_j.png:
d represents the date to collect the fitness data.
p represents the environment to collect the fitness data.
i represents fitness activity type index
u represents user id
j represents sample index
Example:
FA_20220101_lab_1_2_3 represents the 3rd data sample of user 2 of activity 1 collected in the lab
Section 3: Experimental Setup
We place the mmWave device on a table with a height of 60cm.
The participants are asked to perform fitness activity in front of a mmWave device with a distance of 2m.
The data are collected at an lab with a size of (5.0m×3.0m).
Section 4: Data Description
We develop a spatial-temporal heatmap to integrates multiple activity features, including the range of movement, velocity, and time duration of each activity repetition.

We first derive the Doppler-range map of the users' activity by calculating Range-FFT and Doppler-FFT. Then, we generate the spatial-temporal heatmap by accumulating the velocity of every distance in every Doppler-range map together. Next, we normalize the derived velocity information and present the velocity-distance relationship in time dimension. In this way, we transfer the original instantaneous velocity-distance relationship to a more comprehensive spatial-temporal heatmap which describes the process of a whole activity.

As shown in Figure attached, in each spatial-temporal heatmap, the horizontal axis represents the time duration of an activity repetition while the vertical axis represents the range of movement. The velocity is represented by color.

We create 14 zip files to store the the dataset. There are 14 zip files starting with "FA", each contains repetitions from the same fitness activity.
14 common daily activities and their corresponding files
File Name Activity Type File Name Activity Type
FA1 Crunches FA8 Squats
FA2 Elbow plank and reach FA9 Burpees
FA3 Leg raise FA10 Chest squeezes
FA4 Lunges FA11 High knees
FA5 Mountain climber FA12 Side leg raise
FA6 Punches FA13 Side to side chops
FA7 Push ups FA14 Turning kicks

Section 5: Raw Data and Data Processing Algorithms
We also provide the mmWave raw data (.mat format) stored in the same zip file corresponding to the heatmap datasets. Each .mat file can store one set of activity repetitions (e.g., 4 repetations) from a same user.
For example: FA_d_p_i_u_j.mat:
d represents the data to collect the data.
p represents the environment to collect the data.
i represents the activity type index
u represents the user id
j represents the set index
We plan to provide the data processing algorithms (heatmap_generation.py) to load the mmWave raw data and generate the corresponding heatmap data.
Section 6: Citations
If your paper is related to our works, please cite our papers as follows.
https://ieeexplore.ieee.org/document/9868878/
Xie, Yucheng, Ruizhe Jiang, Xiaonan Guo, Yan Wang, Jerry Cheng, and Yingying Chen. "mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave." In 2022 International Conference on Computer Communications and Networks (ICCCN), pp. 1-10. IEEE, 2022.
Bibtex:
@inproceedings{xie2022mmfit,
title={mmFit: Low-Effort Personalized Fitness Monitoring Using Millimeter Wave},
author={Xie, Yucheng and Jiang, Ruizhe and Guo, Xiaonan and Wang, Yan and Cheng, Jerry and Chen, Yingying},
booktitle={2022 International Conference on Computer Communications and Networks (ICCCN)},
pages={1--10},
year={2022},
organization={IEEE}
}
Web-Harvested Image and Caption Dataset
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Web-Harvested Image and Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/web-harvested-image-and-caption-dataset
Explore at:
zip(233254845 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

By conceptual_captions (From Huggingface) [source]

About this dataset

The Conceptual Captions dataset, hosted on Kaggle, is a comprehensive and expansive collection of web-harvested images and their corresponding captions. With a staggering total of approximately 3.3 million images, this dataset offers a rich resource for training and evaluating image captioning models.

Unlike other image caption datasets, the unique feature of Conceptual Captions lies in the diverse range of styles represented in its captions. These captions are sourced from the web, specifically extracted from the Alt-text HTML attribute associated with web images. This approach ensures that the dataset encompasses a broad variety of textual descriptions that accurately reflect real-world usage scenarios.

To guarantee the quality and reliability of these captions, an elaborate automatic pipeline has been developed for extracting, filtering, and transforming each image/caption pair. The goal behind this diligent curation process is to provide clean, informative, fluent, and learnable captions that effectively describe their corresponding images.

The dataset itself consists of two primary components: train.csv and validation.csv files. The train.csv file comprises an extensive collection of over 3.3 million web-harvested images along with their respective carefully curated captions. Each image is accompanied by its unique URL to allow easy retrieval during model training.

On the other hand, validation.csv contains approximately 100,000 image URLs paired with their corresponding informative captions. This subset serves as an invaluable resource for validating and evaluating model performance after training on the larger train.csv set.

Researchers and data scientists can leverage this remarkable Conceptual Captions dataset to develop state-of-the-art computer vision models focused on tasks such as image understanding, natural language processing (NLP), multimodal learning techniques combining visual features with textual context comprehension – among others.

By providing such an extensive array of high-quality images coupled with richly descriptive captions acquired from various sources across the internet landscape through a meticulous curation process - Conceptual Captions empowers professionals working in fields like artificial intelligence (AI), machine learning, computer vision, and natural language processing to explore new frontiers in visual understanding and textual comprehension

How to use the dataset

Title: How to Use the Conceptual Captions Dataset for Web-Harvested Image and Caption Analysis

Introduction: The Conceptual Captions dataset is an extensive collection of web-harvested images, each accompanied by a caption. This guide aims to help you understand and effectively utilize this dataset for various applications, such as image captioning, natural language processing, computer vision tasks, and more. Let's dive into the details!

Step 1: Acquiring the Dataset

Step 2: Exploring the Dataset Files After downloading the dataset files ('train.csv' and 'validation.csv'), you'll find that each file consists of multiple columns containing valuable information:

a) 'caption': This column holds captions associated with each image. It provides textual descriptions that can be used in various NLP tasks. b) 'image_url': This column contains URLs pointing to individual images in the dataset.

Step 3: Understanding Dataset Structure The Conceptual Captions dataset follows a tabular format where each row represents an image/caption pair. Combining knowledge from both train.csv and validation.csv files will give you access to a diverse range of approximately 3.4 million paired examples.

Step 4: Preprocessing Considerations Due to its web-harvested nature, it is recommended to perform certain preprocessing steps on this dataset before utilizing it for your specific task(s). Some considerations include:

a) Text Cleaning: Perform basic text cleaning techniques such as removing special characters or applying sentence tokenization. b) Filtering: Depending on your application, you may need to apply specific filters to remove captions that are irrelevant, inaccurate, or noisy. c) Language Preprocessing: Consider using techniques like lemmatization or stemming if it suits your task.

Step 5: Training and Evaluation Once you have preprocessed the dataset as per your requirements, it's time to train your models! The Conceptual Captions dataset can be used for a range of tasks such as image captioni...
N
South Range, MI Annual Population and Growth Analysis Dataset: A...
neilsberg.com
csv, json
Updated Jul 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). South Range, MI Annual Population and Growth Analysis Dataset: A Comprehensive Overview of Population Changes and Yearly Growth Rates in South Range from 2000 to 2023 // 2024 Edition [Dataset]. https://www.neilsberg.com/insights/south-range-mi-population-by-year/
Explore at:
json, csvAvailable download formats
Dataset updated
Jul 30, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Michigan, South Range
Variables measured
Annual Population Growth Rate, Population Between 2000 and 2023, Annual Population Growth Rate Percent
Measurement technique
The data presented in this dataset is derived from the 20 years data of U.S. Census Bureau Population Estimates Program (PEP) 2000 - 2023. To measure the variables, namely (a) population and (b) population change in ( absolute and as a percentage ), we initially analyzed and tabulated the data for each of the years between 2000 and 2023. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the South Range population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of South Range across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.

Key observations

In 2023, the population of South Range was 741, a 0.27% decrease year-by-year from 2022. Previously, in 2022, South Range population was 743, an increase of 0.13% compared to a population of 742 in 2021. Over the last 20 plus years, between 2000 and 2023, population of South Range increased by 17. In this period, the peak population was 760 in the year 2010. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).

Content

When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).

Data Coverage:

From 2000 to 2023

Variables / Data Columns

Year: This column displays the data year (Measured annually and for years 2000 to 2023)

Population: The population for the specific year for the South Range is shown in this column.

Year on Year Change: This column displays the change in South Range population for each year compared to the previous year.

Change in Percent: This column displays the year on year change as a percentage. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for South Range Population by Year. You can refer the same here
f
Differences in dissolved oxygen (DO) diel ranges between impoundment and...
datasetcatalog.nlm.nih.gov
plos.figshare.com
+1more
Updated Nov 17, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nislow, Keith H.; Abbott, Katherine M.; Zaidel, Peter A.; Roy, Allison H.; Houle, Kristopher M. (2022). Differences in dissolved oxygen (DO) diel ranges between impoundment and downstream reaches and upstream references before and after dam removal, termed impoundment or downstream range effect. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000270511
Explore at:
Dataset updated
Nov 17, 2022
Authors
Nislow, Keith H.; Abbott, Katherine M.; Zaidel, Peter A.; Roy, Allison H.; Houle, Kristopher M.
Description
Range effect is the average difference between diel ranges of impacted and upstream reaches. Positive values indicate a higher diel DO range relative to upstream.
Southern Long-Toed Salamander Range - CWHR A003B [ds2844]
data.ca.gov
data.cnra.ca.gov
+5more
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Southern Long-Toed Salamander Range - CWHR A003B [ds2844] [Dataset]. https://data.ca.gov/dataset/southern-long-toed-salamander-range-cwhr-a003b-ds2844
Explore at:
geojson, zip, arcgis geoservices rest api, kml, csv, html, ashxAvailable download formats
Dataset updated
Oct 27, 2025
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CWHR species range datasets represent the maximum current geographic extent of each species within California. Ranges were originally delineated at a scale of 1:5,000,000 by species-level experts more than 30 years ago and have gradually been revised at a scale of 1:1,000,000. Species occurrence data are used in defining species ranges, but range polygons may extend beyond the limits of extant occurrence data for a particular species. When drawing range boundaries, CDFW seeks to err on the side of commission rather than omission. This means that CDFW may include areas within a range based on expert knowledge or other available information, despite an absence of confirmed occurrences, which may be due to a lack of survey effort. The degree to which a range polygon is extended beyond occurrence data will vary among species, depending upon each species’ vagility, dispersal patterns, and other ecological and life history factors. The boundary line of a range polygon is drawn with consideration of these factors and is aligned with standardized boundaries including watersheds (NHD), ecoregions (USDA), or other ecologically meaningful delineations such as elevation contour lines. While CWHR ranges are meant to represent the current range, once an area has been designated as part of a species’ range in CWHR, it will remain part of the range even if there have been no documented occurrences within recent decades. An area is not removed from the range polygon unless experts indicate that it has not been occupied for a number of years after repeated surveys or is deemed no longer suitable and unlikely to be recolonized. It is important to note that range polygons typically contain areas in which a species is not expected to be found due to the patchy configuration of suitable habitat within a species’ range. In this regard, range polygons are coarse generalizations of where a species may be found. This data is available for download from the CDFW website: https://www.wildlife.ca.gov/Data/CWHR.
The following data sources were collated for the purposes of range mapping and species habitat modeling by RADMAP. Each focal taxon’s location data was extracted (when applicable) from the following list of sources. BIOS datasets are bracketed with their “ds” numbers and can be located on CDFW’s BIOS viewer: https://wildlife.ca.gov/Data/BIOS.
California Natural Diversity Database,
Terrestrial Species Monitoring [ds2826],
North American Bat Monitoring Data Portal,
VertNet,
Breeding Bird Survey,
Wildlife Insights,
eBird,
iNaturalist,
other available CDFW or partner data.
Blunt-Nosed Leopard Lizard Range - CWHR R019 [ds1726]
data-cdfw.opendata.arcgis.com
data.cnra.ca.gov
+4more
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2025). Blunt-Nosed Leopard Lizard Range - CWHR R019 [ds1726] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::blunt-nosed-leopard-lizard-range-cwhr-r019-ds1726
Explore at:
Dataset updated
Oct 22, 2025
Dataset authored and provided by
California Department of Fish and Wildlifehttps://wildlife.ca.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
CWHR species range datasets represent the maximum current geographic extent of each species within California. Ranges were originally delineated at a scale of 1:5,000,000 by species-level experts more than 30 years ago and have gradually been revised at a scale of 1:1,000,000. Species occurrence data are used in defining species ranges, but range polygons may extend beyond the limits of extant occurrence data for a particular species. When drawing range boundaries, CDFW seeks to err on the side of commission rather than omission. This means that CDFW may include areas within a range based on expert knowledge or other available information, despite an absence of confirmed occurrences, which may be due to a lack of survey effort. The degree to which a range polygon is extended beyond occurrence data will vary among species, depending upon each species’ vagility, dispersal patterns, and other ecological and life history factors. The boundary line of a range polygon is drawn with consideration of these factors and is aligned with standardized boundaries including watersheds (NHD), ecoregions (USDA), or other ecologically meaningful delineations such as elevation contour lines. While CWHR ranges are meant to represent the current range, once an area has been designated as part of a species’ range in CWHR, it will remain part of the range even if there have been no documented occurrences within recent decades. An area is not removed from the range polygon unless experts indicate that it has not been occupied for a number of years after repeated surveys or is deemed no longer suitable and unlikely to be recolonized. It is important to note that range polygons typically contain areas in which a species is not expected to be found due to the patchy configuration of suitable habitat within a species’ range. In this regard, range polygons are coarse generalizations of where a species may be found. This data is available for download from the CDFW website: https://www.wildlife.ca.gov/Data/CWHR. The following data sources were collated for the purposes of range mapping and species habitat modeling by RADMAP. Each focal taxon’s location data was extracted (when applicable) from the following list of sources. BIOS datasets are bracketed with their “ds” numbers and can be located on CDFW’s BIOS viewer: https://wildlife.ca.gov/Data/BIOS. California Natural Diversity Database, Terrestrial Species Monitoring [ds2826], North American Bat Monitoring Data Portal, VertNet, Breeding Bird Survey, Wildlife Insights, eBird, iNaturalist, other available CDFW or partner data.

Facebook

Twitter

Click to copy link

Link copied

Cite

S M MEHEDI ZAMAN (2023). Path loss at 5G high frequency range in South Asia [Dataset]. https://www.kaggle.com/datasets/smmehedizaman/path-loss-at-5g-high-frequency-range-in-south-asia

Path loss at 5G high frequency range in South Asia

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 25, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

S M MEHEDI ZAMAN

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Area covered

South Asia, Asia

Description

This dataset has been generated using NYUSIM 3.0 mm-Wave channel simulator software, which takes into account atmospheric data such as rain rate, humidity, barometric pressure, and temperature. The input data was collected over the course of a year in South Asia. As a result, the dataset provides an accurate representation of the seasonal variations in mm-wave channel characteristics in these areas. The dataset includes a total of 2835 records, each of which contains T-R Separation Distance (m), Time Delay (ns), Received Power (dBm), Phase (rad), Azimuth AoD (degree), Elevation AoD (degree), Azimuth AoA (degree), Elevation, AoA (degree), RMS Delay Spread (ns), Season, Frequency and Path Loss (dB). Four main seasons have been considered in this dataset: Spring, Summer, Fall, and Winter. Each season is subdivided into three parts (i.e., low, medium, and high), to accurately include the atmospheric variations in a season. To simulate the path loss, realistic Tx and Rx height, NLoS environment, and mean human blockage attenuation effects have been taken into consideration. The data has been preprocessed and normalized to ensure consistency and ease of use. Researchers in the field of mm-wave communications and networking can use this dataset to study the impact of atmospheric conditions on mm-wave channel characteristics and develop more accurate models for predicting channel behavior. The dataset can also be used to evaluate the performance of different communication protocols and signal processing techniques under varying weather conditions. Note that while the data was collected specifically in South Asia region, the high correlation between the weather patterns in this region and other areas means that the dataset may also be applicable to other regions with similar atmospheric conditions.

Acknowledgements The paper in which the dataset was proposed is available on: https://ieeexplore.ieee.org/abstract/document/10307972

Citation

If you use this dataset, please cite the following paper:

Rashed Hasan Ratul, S. M. Mehedi Zaman, Hasib Arman Chowdhury, Md. Zayed Hassan Sagor, Mohammad Tawhid Kawser, and Mirza Muntasir Nishat, “Atmospheric Influence on the Path Loss at High Frequencies for Deployment of 5G Cellular Communication Networks,” 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2023, pp. 1–6. https://doi.org/10.1109/ICCCNT56998.2023.10307972

BibTeX ```bibtex @inproceedings{Ratul2023Atmospheric, author = {Ratul, Rashed Hasan and Zaman, S. M. Mehedi and Chowdhury, Hasib Arman and Sagor, Md. Zayed Hassan and Kawser, Mohammad Tawhid and Nishat, Mirza Muntasir}, title = {Atmospheric Influence on the Path Loss at High Frequencies for Deployment of {5G} Cellular Communication Networks}, booktitle = {2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)}, year = {2023}, pages = {1--6}, doi = {10.1109/ICCCNT56998.2023.10307972}, keywords = {Wireless communication; Fluctuations; Rain; 5G mobile communication; Atmospheric modeling; Simulation; Predictive models; 5G-NR; mm-wave propagation; path loss; atmospheric influence; NYUSIM; ML} }

Clear search

Close search

Google apps

Main menu

Path loss at 5G high frequency range in South Asia

Citation

South Range, MI Population Breakdown by Gender and Age Dataset: Male and...

About this dataset

Content

Inspiration

Recommended for further research

Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6

Point Cloud Mnist 2D

Point Cloud MNIST 2D

Format

Padding

Subsamples

50

100

200

351

Distribution

Car Driving Distance Range Dataset

What’s in a tank of fuel?

Content

Acknowledgements

Datasets of Oxygen concentration measurements in ICU boxes at Faculty...

TIGER/Line Shapefile, 2023, County, Pinal County, AZ, Address Range-Feature...

TIGER/Line Shapefile, 2023, County, Grant County, WA, Address Range-Feature...

Rescaled Fashion-MNIST dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

TIGER/Line Shapefile, Current, County, Clinton County, NY, Address...

Crypto, Web3 and Blockchain Jobs

Crypto, Web3 and Blockchain Jobs

Scraped active crypto jobs listed on cryptojobslist.com

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Pre and Post-Exercise Heart Rate Analysis

Rescaled CIFAR-10 dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Finder Close Range Dataset

Finder Close Range

mmWave-based Fitness Activity Recognition Dataset

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

About this dataset

How to use the dataset

South Range, MI Annual Population and Growth Analysis Dataset: A...

About this dataset

Content

Inspiration

Recommended for further research

Differences in dissolved oxygen (DO) diel ranges between impoundment and...

Southern Long-Toed Salamander Range - CWHR A003B [ds2844]

Blunt-Nosed Leopard Lizard Range - CWHR R019 [ds1726]

Path loss at 5G high frequency range in South Asia

Citation