Facebook
Twitter(1)Vector, human and non-human hosts natural death rates were estimated as 1/individual longevity. The range of variation of longevity (i.e. 1/death rate parameter defined in the model), as those are the raw data found in the literature (see sections ‘Vector local growth rate’ and ‘Human and non-human hosts natural death rates’ in Text S1).(2)Death rates were calculated as the sum of the natural death rate of human or non-human hosts and additional mortality imposed by the pathogen to infectious and ‘recovered’ individuals (as calculated in section ‘Human and non-human hosts mortality induced by the pathogen’ in Text S1).
Facebook
TwitterThe TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.
Key observations
Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here
Facebook
TwitterAll parameters are assumed non-negative. S(0), , , I2(0), and R(0) define the initial population sizes. Dashes are used when values are arbitrarily chosen from some range.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The project lead for the collection of this data was Carrington Hilson. Elk (9 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2023-2024. The Potter-Redwood Valley herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed at 6.5 hour intervals in the dataset. To improve the quality of the data set, all points with DOP values greater than 5 and those points visually assessed as a bad fix by the analyst were removed. The methodology used for this migration analysis allowed for the mapping of the herd's home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 8 elk, including 15 annual home range sequences, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours and a fixed motion variance of 1000. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.
Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.
Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.
The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary:
python
def unpickle(file):
import cPickle
with open(file, 'rb') as fo:
dict = cPickle.load(fo)
return dict
And a python3 version:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Loaded in this way, each of the batch files contains a dictionary with the following elements:
data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.
The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.
Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.
There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.
The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
File name definitions:
'...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s
'...v_175_250...' - dataset for velocity range [175, 250] m/s
'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected
'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart
Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?
input values in 'IN' sheet
target values in 'TARGET' sheet
Where to find the results from the best ANN model (for each target/output variable and each velocity range)?
open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet
Check reference below (to be added when the paper is published)
https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams
Facebook
TwitterThis data set consists of digital hydraulic conductivity values for the alluvial and terrace deposits along the Beaver-North Canadian River from the panhandle to Canton Lake in northwestern Oklahoma. Ground water in 830 square miles of the Quaternary-age alluvial and terrace aquifer is an important source of water for irrigation, industrial, municipal, stock, and domestic supplies. The aquifer consists of poorly sorted, fine to coarse, unconsolidated quartz sand with minor amounts of clay, silt, and basal gravel. The hydraulically connected alluvial and terrace deposits unconformably overlie the Tertiary-age Ogallala Formation and Permian-age formations. Six zones of ranges of hydraulic conductivity values for the alluvial and terrace deposits reported in a ground-water modeling report are used in this data set. The hydraulic conductivity values range from 0 to 160 feet per day, and average 59 feet per day. The features in the data set representing aquifer boundaries along geological contacts were extracted from a published digital surficial geology data set based on a scale of 1:250,000. The geographic limits of the aquifer and zones representing ranges of hydraulic conductivity values were digitized from folded paper maps, at a scale of 1:250,000 from a ground-water modeling report. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of hydraulic conductivity used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.
The dataset is designed to capture essential attributes for predicting house prices, including:
Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.
Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.
3. Correlation Between Features
A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.
The dataset is well-suited for various machine learning and data analysis applications, including:
House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.
Facebook
TwitterThe exercise after this contains questions that are based on the housing dataset.
How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173
How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161
How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92
What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000
For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.
What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features
If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.
If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above
If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above
Facebook
TwitterThis dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).
You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly
import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os
for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:
for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks,
connections = mp_hands.HAND_CONNECTIONS)
a = dict()
a['label'] = t
for i in range(21):
s = ('x','y','z')
k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z)
for j in range(len(k)):
a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j]
df = df.append(a,ignore_index=True)
Facebook
Twitterhttp://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence
Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.
The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.
This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.
Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.
The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.
This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.
The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.
At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.
The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.
The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.
The original data is available from the ONS website.
Detailed information on the APS and the Subjective Wellbeing dataset is available here.
As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Tables of areas in which a hazard of a certain type under a certain scenario causes a rise of water whose height is within a fixed range of values. Spatial data set produced by the GIS High Flood Risk Land Flood Directive (TRI) of... and mapped for reporting purposes for the European Flood Directive. European Directive 2007/60/EC of 23 October 2007 on the assessment and management of flood risks (OJ L 288, 06-11-2007, p. 27) influences the flood prevention strategy in Europe. It requires the production of flood risk management plans to reduce the negative consequences of flooding on human health, the environment, cultural heritage and economic activity. The objectives and implementation requirements are set out in the Law of 12 July 2010 on the National Commitment for the Environment (LENE) and the Decree of 2 March 2011. In this context, the primary objective of flood and flood risk mapping for IRRs is to contribute, by homogenising and objectivating knowledge of flood exposure, to the development of flood risk management plans (WRMs). This dataset is used to produce flood surface maps and flood risk maps that represent flood hazards and issues at an appropriate scale, respectively. Their objective is to provide quantitative evidence to further assess the vulnerability of a territory for the three levels of probability of flooding (high, medium, low).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains the characteristics of patients diagnosed with cancer. The dataset contains a unique ID for each patient, the type of cancer (diagnosis), the visual characteristics of the cancer and the average values of these characteristics.
There are also several categorical features where patients in the dataset are labeled with numerical values. You can examine them in the Chart area.
Other features contain specific ranges of average values of the features of the cancer image:
Each of these features is mapped to a table containing the number of values in a given range. You can examine the Chart Tables
Each sample contains the patient's unique ID, the cancer diagnosis and the average values of the cancer's visual characteristics.
Such a dataset can be used to train or test models and algorithms used to make cancer diagnoses. Understanding and analyzing the dataset can contribute to the improvement of cancer-related visual features and diagnosis.
Logistic Regression: This algorithm can be used effectively for binary classification problems. In this dataset, logistic regression may be an appropriate choice since there are "Malignant" (benign) and "Benign" (malignant) classes. It can be used to predict cancer type with the visual features in the dataset.
K-Nearest Neighbors (KNN): KNN classifies an example by looking at the k closest examples around it. This algorithm assumes that patients with similar characteristics tend to have similar types of cancer. KNN can be used for cancer diagnosis by taking into account neighborhood relationships in the data set.
Support Vector Machines (SVM): SVM is effective for classification tasks, especially for two-class problems. Focusing on the clear separation of classes in the dataset, SVM is a powerful algorithm that can be used for cancer diagnosis.
K-NN Project: https://www.kaggle.com/code/erdemtaha/prediction-cancer-data-with-k-nn-95
Logistic Regressüon: https://www.kaggle.com/code/erdemtaha/cancer-prediction-96-5-with-logistic-regression
This is a copy of content that has been elaborated for educational purposes and published to reach more people, you can access the original source from the link below, please do not forget to support that data
🔗 https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
This database can also be accessed via the UW CS ftp server: 🔗 ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/
It can also be found at the UCI Machine Learning Repository: 🔗 https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
If you have some questions or curiosities about the data or studies, you can contact me as you wish from the links below 😊
LinkedIn: https://www.linkedin.com/in/erdem-taha-sokullu/
Mail: erdemtahasokullu@gmail.com
Github: https://github.com/Prometheussx
Kaggle: https://www.kaggle.com/erdemtaha
This Data has a CC BY-NC-SA 4.0 License You can review the license rules from the link below
License Link: https://creativecommons.org/licenses/by-nc-sa/4.0/
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset comprises 953 synthetically generated entries detailing various traditional Marathi ornaments. It is designed to provide a structured collection of common features associated with these unique pieces of jewelry, often worn in Maharashtra, India.
Purpose: The primary purpose of this dataset is to serve as a foundational resource for:
Educational Projects: Students and enthusiasts can use it to learn about data handling, analysis, and visualization.
Machine Learning Exploration: Researchers can explore classification or regression tasks, for instance, predicting the type of ornament based on its physical properties or vice-versa.
Jewelry Domain Studies: Individuals interested in traditional Indian jewelry can gain insights into the typical characteristics of these ornaments.
Data Generation Practice: It can serve as an example for understanding how synthetic datasets can be created for specific domains.
Content & Generation: The dataset was created programmatically by defining plausible ranges and distributions for each feature based on general knowledge of these ornaments. While synthetic, the values aim to reflect realistic characteristics for each ornament type, acknowledging that actual jewelry pieces will have unique variations. For example:
Weight, Length/Height, Width: Ranges were set to represent typical sizes and weights.
Number of Components/Units & Stones/Pearls: These features vary significantly based on the ornament's intricate design, from single-unit pieces like 'Nath' to multi-component necklaces like 'Thushi' or 'Mohan Mala'.
Carat Weight of Stones: Applied only to ornaments that typically feature stones or pearls.
Gold Purity: Reflects common gold purities used in Indian jewelry (e.g., 20K, 21K, 22K, 23K, 24K). Silver purity (e.g., 80-95%) is assigned for 'Jodvi'.
Primary Material: Predominantly 'Gold' for most ornaments, with 'Silver' for 'Jodvi'.
This dataset offers a starting point for analyses where real-world data might be scarce or difficult to collect.
File Information
File Name: marathi_ornaments_dataset.csv
Number of Rows: 953
Number of Columns: 8
Approximate File Size: ~60 KB (will vary slightly based on exact content and line endings)
Column Descriptor
Here's a detailed description for each column in the marathi_ornaments_dataset.csv file:
Ornament Class
Description: The traditional Marathi name of the jewelry item. This is the categorical target variable representing different types of ornaments.
Data Type: String (Categorical)
Possible Values: Nath, Thushi, Kolhapuri Saaj, Mohan Mala, Laxmi Haar, Tanmani, Chinchpeti, Bakuli Haar, Surya Haar, Bugadi, Kudya, Bajuband, Tode, Patlya, Mangalsutra, Jodvi, Kambarpatta
Weight (grams)
Description: The approximate weight of the ornament in grams.
Data Type: Float
Units: grams (g)
Range: Varies significantly by ornament type (e.g., Nath would be lighter, Laxmi Haar or Kambarpatta would be heavier).
Length/Height (cm)
Description: The approximate length (for necklaces, bracelets) or height (for earrings, nose rings) of the ornament in centimeters.
Data Type: Float
Units: centimeters (cm)
Range: Varies by ornament type.
Width (cm)
Description: The approximate width of the ornament in centimeters.
Data Type: Float
Units: centimeters (cm)
Range: Varies by ornament type and design.
Number of Components/Units
Description: The total count of distinct, often repeated, design elements or units that make up the ornament. For intricate necklaces, this can be high.
Data Type: Integer
Range: 1 to ~1000 (especially for fine 'Thushi' beads).
Number of Stones/Pearls
Description: The count of stones (e.g., diamonds, rubies, emeralds) or pearls embedded in or attached to the ornament.
Data Type: Integer
Range: 0 to ~50 (many traditional designs have no stones, some have many).
Carat Weight of Stones
Description: The total approximate carat weight of all stones present in the ornament. This value is 0.0 if Number of Stones/Pearls is 0.
Data Type: Float
Units: Carats (ct)
Range: 0.0 to ~1.0 (or higher for very elaborate pieces).
Gold Purity (Karat)
Description: The purity of the primary gold material used, expressed in Karats. For 'Jodvi', which are traditionally silver, this represents silver purity as a percentage (even though labeled 'Gold Purity (Karat)' for consistency in column headers).
Data Type: Integer
Units: Karat (K) for gold, Percentage (%) for silver (for Jodvi).
Possible Values: 20, 21, 22, 23, 24 for Gold. 80 to 95 for Silver (specifically for Jodvi).
Primary Material
Des...
Facebook
TwitterThis dataset contains additional "small" habitat cores that had a minimum size of 1 female marten home range (300ha), but were too small to meet the minimum size threshold of 5 female home ranges (1500ha) used to define cores in the Primary Model. This dataset also contains the habitat cores from the Primary Model (i.e. cores ≥1500ha). The description following this paragraph is adapted from the the metadata description for developing cores in the Primary Model. These methods are identical to those used in developing cores in the Primary Model, with one exception: The minimum habitat core size parameter used in the Core Mapper tool was set to 300ha instead of 1500ha. It should be noted that a single core in this dataset actually slightly exceeded the 1500ha threshold for its final area calculation but was not present in the Primary Model set of habitat cores. We determined that this was because the "1500ha cutoff" in the tool was actually applied before the core was expanded by 977m to fill in interior holes and then subsequently trimmed back (In the Core Mapper tool, this is controlled by the "Expand cores by this CWD value" and "Trim back expanded cores" parameters). We derived the habitat cores using a tool within Gnarly Landscape Utilities called Core Mapper (Shirk and McRae 2015). To develop a Habitat Surface for input into Core Mapper, we started by assigning each 30m pixel on the modeled landscape a habitat value equal to its GNN OGSI (range = 0-100). In areas with serpentine soils that support habitat potentially suitable for coastal marten (see report for details), we assigned a minimum habitat value of 31, which is equivalent to the 33rd percentile of OGSI 80 pixels in the marten’s historical range. Pixels with higher OGSI retained their normal habitat value. Our intention was to allow the modified serpentine pixels to be more easily incorporated into habitat cores if there were higher value OGSI pixels in the vicinity, but not to have them form the entire basis of a core. We also excluded pixels with a habitat value <1.0 from inclusion in habitat cores. We then used a moving window to calculate the average habitat value within a 977m radius around each pixel (derived from the estimated average size of a female marten’s home range of 300 ha). Pixels with an average habitat value ≥36.0 were then incorporated into habitat cores. After conducting a sensitivity analysis by running a set of Core Mapper trials using a broad range of habitat values, we chose ≥36.0 as the average habitat value because it is the median OGSI of pixels within the marten’s historical range classified by the GNN as “OGSI 80” (Davis et al. 2015). It generated a set of habitat cores that were not overly generous (depicting most of the landscape as habitat core) or strict (only mapping cores in a few locations with very high OGSI such as Redwood State and National Parks) (see Appendix 3 of the referenced report for more details, including example maps from our sensitivity analysis). We then set Core Mapper to expand the habitat cores by 977 cost-weighted meters, a step intended to consolidate smaller cores that were probably relatively close together from a marten’s perspective. This was followed by a “trimming” step that removed pixels from the expansion that did not meet the moving window average so the net result was rather small changes in the size of the habitat cores, but filling in many individual isolated pixels with a habitat value of 0. This is an abbreviated and incomplete description of the dataset. Please refer to the spatial metadata for a more thorough description of the methods used to produce this dataset, and a discussion of any assumptions or caveats that should be taken into consideration.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By US Open Data Portal, data.gov [source]
This Kaggle dataset showcases the groundbreaking research undertaken by the GRACEnet program, which is attempting to better understand and minimize greenhouse gas (GHG) emissions from agro-ecosystems in order to create a healthier world for all. Through multi-location field studies that utilize standardized protocols – combined with models, producers, and policy makers – GRACEnet seeks to: typify existing production practices, maximize C sequestration, minimize net GHG emissions, and meet sustainable production goals. This Kaggle dataset allows us to evaluate the impact of different management systems on factors such as carbon dioxide and nitrous oxide emissions, C sequestration levels, crop/forest yield levels – plus additional environmental effects like air quality etc. With this data we can start getting an idea of the ways that agricultural policies may be influencing our planet's ever-evolving climate dilemma
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Step 1: Familiarize yourself with the columns in this dataset. In particular, pay attention to Spreadsheet tab description (brief description of each spreadsheet tab), Element or value display name (name of each element or value being measured), Description (detailed description), Data type (type of data being measured) Unit (unit of measurement for the data) Calculation (calculation used to determine a value or percentage) Format (format required for submitting values), Low Value and High Value (range for acceptable entries).
Step 2: Familiarize yourself with any additional information related to calculations. Most calculations made use of accepted best estimates based on standard protocols defined by GRACEnet. Every calculation was described in detail and included post-processing steps such as quality assurance/quality control changes as well as measurement uncertainty assessment etc., as available sources permit relevant calculations were discussed collaboratively between all participating partners at every level where they felt necessary. All terms were rigorously reviewed before all partners agreed upon any decision(s). A range was established when several assumptions were needed or when there was a high possibility that samples might fall outside previously accepted ranges associated with standard protocol conditions set up at GRACEnet Headquarters laboratories resulting due to other external factors like soil type, climate etc,.
Step 3: Determine what types of operations are allowed within each spreadsheet tab (.csv file). For example on some tabs operations like adding an entire row may be permitted but using formulas is not permitted since all non-standard manipulations often introduce errors into an analysis which is why users are encouraged only add new rows/columns provided it is seen fit for their specific analysis operations like fill blank cells by zeros or delete rows/columns made redundant after standard filtering process which have been removed earlier from different tabs should be avoided since these nonstandard changes create unverified extra noise which can bias your results later on during robustness testing processes related to self verification process thereby creating erroneous output results also such action also might result into additional FET values due API's specially crafted excel documents while selecting two ways combo box therefore
- Analyzing and comparing the environmental benefits of different agricultural management practices, such as crop yields and carbon sequestration rates.
- Developing an app or other mobile platform to help farmers find management practices that maximize carbon sequestration and minimize GHG emissions in their area, based on their specific soil condition and climate data.
- Building an AI-driven model to predict net greenhouse gas emissions and C sequestration from potential weekly/monthly production plans across different regions in the world, based on optimal allocation of resources such as fertilizers, equipment, water etc
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the ...
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
The dataset is a combination of Million Playlist Dataset and Spotify API.
The SQLite is in .db format. With one table extracted. Following are all the columns in this table.
- track_uri (TEXT PRIMARY KEY): Unique identifier used by Spotify for songs.
- track_name (TEXT): Song name.
- artist_name (TEXT): Artist name.
- artist_uri (TEXT): Unique identifier used by Spotify for artists.
- album_name (TEXT): Album name
- album_uri (TEXT): Unique identifier used by Spotify for albums.
- duration_ms (INTEGER): Duration of the song.
- danceability (REAL): Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
- energy (REAL): Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.
- key (INTEGER): The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.
- loudness (REAL): The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db.
- mode (INTEGER): Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0.
- speechiness (REAL): Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.
- acousticness (REAL): A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
- instrumentalness (REAL): Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.
- liveness (REAL): Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.
- valence (REAL): A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
- tempo (REAL): The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.
- type (TEXT): The object type.
- id (TEXT): The Spotify ID for the track.
- uri (TEXT): The Spotify URI for the track.
- track_href (TEXT): A link to the Web API endpoint providing full details of the track.
- analysis_url (TEXT): A URL to access the full audio analysis of this track. An access token is required to access this data.
- fduration_ms (INTEGER): The duration of the track in milliseconds.
- time_signature (INTEGER): An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description
This dataset contains a simulated collection of 1,00000 patient records designed to explore hypertension management in resource-constrained settings. It provides comprehensive data for analyzing blood pressure control rates, associated risk factors, and complications. The dataset is ideal for predictive modelling, risk analysis, and treatment optimization, offering insights into demographic, clinical, and treatment-related variables.
Dataset Structure
Dataset Volume
• Size: 10,000 records. • Features: 19 variables, categorized into Sociodemographic, Clinical, Complications, and Treatment/Control groups.
Variables and Categories
A. Sociodemographic Variables
1. Age:
• Continuous variable in years.
• Range: 18–80 years.
• Mean ± SD: 49.37 ± 12.81.
2. Sex:
• Categorical variable.
• Values: Male, Female.
3. Education:
• Categorical variable.
• Values: No Education, Primary, Secondary, Higher Secondary, Graduate, Post-Graduate, Madrasa.
4. Occupation:
• Categorical variable.
• Values: Service, Business, Agriculture, Retired, Unemployed, Housewife.
5. Monthly Income:
• Categorical variable in Bangladeshi Taka.
• Values: <5000, 5001–10000, 10001–15000, >15000.
6. Residence:
• Categorical variable.
• Values: Urban, Sub-urban, Rural.
B. Clinical Variables
7. Systolic BP:
• Continuous variable in mmHg.
• Range: 100–200 mmHg.
• Mean ± SD: 140 ± 15 mmHg.
8. Diastolic BP:
• Continuous variable in mmHg.
• Range: 60–120 mmHg.
• Mean ± SD: 90 ± 10 mmHg.
9. Elevated Creatinine:
• Binary variable (\geq 1.4 \, \text{mg/dL}).
• Values: Yes, No.
10. Diabetes Mellitus:
• Binary variable.
• Values: Yes, No.
11. Family History of CVD:
• Binary variable.
• Values: Yes, No.
12. Elevated Cholesterol:
• Binary variable (\geq 200 \, \text{mg/dL}).
• Values: Yes, No.
13. Smoking:
• Binary variable.
• Values: Yes, No.
C. Complications
14. LVH (Left Ventricular Hypertrophy):
• Binary variable (ECG diagnosis).
• Values: Yes, No.
15. IHD (Ischemic Heart Disease):
• Binary variable.
• Values: Yes, No.
16. CVD (Cerebrovascular Disease):
• Binary variable.
• Values: Yes, No.
17. Retinopathy:
• Binary variable.
• Values: Yes, No.
D. Treatment and Control
18. Treatment:
• Categorical variable indicating therapy type.
• Values: Single Drug, Combination Drugs.
19. Control Status:
• Binary variable.
• Values: Controlled, Uncontrolled.
Dataset Applications
1. Predictive Modeling:
• Develop models to predict blood pressure control status using demographic and clinical data.
2. Risk Analysis:
• Identify significant factors influencing hypertension control and complications.
3. Severity Scoring:
• Quantify hypertension severity for patient risk stratification.
4. Complications Prediction:
• Forecast complications like IHD, LVH, and CVD for early intervention.
5. Treatment Guidance:
• Analyze therapy efficacy to recommend optimal treatment strategies.
Facebook
TwitterThe dataset contains information about car wash customers. It has 1000 rows, each representing a different customer, and five features that describe various aspects of their car wash habits and preferences. Here are the features in detail:
Frequency_of_Washes:
Type: Integer Description: This column indicates how often a customer gets their car washed in a month. The values range from 1 to 11 washes per month. Example Values: 4, 2, 8 Spending_per_Visit:
Type: Float Description: This column represents the amount of money a customer spends on each car wash visit. The values are in dollars and range from $10 to $50. Example Values: 30.5, 15.75, 40.2 Preferred_Service_Type:
Type: Categorical (String) Description: This column indicates the type of car wash service the customer prefers. The possible values are "Basic," "Premium," and "Detailing." Example Values: Premium, Basic, Detailing Vehicle_Age:
Type: Integer Description: This column shows the age of the customer's vehicle in years. The values range from 0 to 20 years. Example Values: 3, 10, 1 Customer_Loyalty:
Type: Categorical (String) Description: This column indicates the loyalty level of the customer. The possible values are "Low," "Medium," and "High." Example Values: High, Medium, Low
Facebook
Twitter(1)Vector, human and non-human hosts natural death rates were estimated as 1/individual longevity. The range of variation of longevity (i.e. 1/death rate parameter defined in the model), as those are the raw data found in the literature (see sections ‘Vector local growth rate’ and ‘Human and non-human hosts natural death rates’ in Text S1).(2)Death rates were calculated as the sum of the natural death rate of human or non-human hosts and additional mortality imposed by the pathogen to infectious and ‘recovered’ individuals (as calculated in section ‘Human and non-human hosts mortality induced by the pathogen’ in Text S1).