100+ datasets found
  1. f

    Parameters definition and range of values.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated May 18, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Menu, Frédéric; Rascalou, Guilhem; Gourbière, Sébastien; Pontier, Dominique (2012). Parameters definition and range of values. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001141670
    Explore at:
    Dataset updated
    May 18, 2012
    Authors
    Menu, Frédéric; Rascalou, Guilhem; Gourbière, Sébastien; Pontier, Dominique
    Description

    (1)Vector, human and non-human hosts natural death rates were estimated as 1/individual longevity. The range of variation of longevity (i.e. 1/death rate parameter defined in the model), as those are the raw data found in the literature (see sections ‘Vector local growth rate’ and ‘Human and non-human hosts natural death rates’ in Text S1).(2)Death rates were calculated as the sum of the natural death rate of human or non-human hosts and additional mortality imposed by the pathogen to infectious and ‘recovered’ individuals (as calculated in section ‘Human and non-human hosts mortality induced by the pathogen’ in Text S1).

  2. TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address...

    • catalog.data.gov
    • gimi9.com
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Hamilton County, NE, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-hamilton-county-ne-address-range-feature
    Explore at:
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Area covered
    Hamilton County
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.

  3. N

    South Range, MI Population Breakdown by Gender and Age Dataset: Male and...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). South Range, MI Population Breakdown by Gender and Age Dataset: Male and Female Population Distribution Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/e200fba9-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Range, Michigan
    Variables measured
    Male and Female Population Under 5 Years, Male and Female Population over 85 years, Male and Female Population Between 5 and 9 years, Male and Female Population Between 10 and 14 years, Male and Female Population Between 15 and 19 years, Male and Female Population Between 20 and 24 years, Male and Female Population Between 25 and 29 years, Male and Female Population Between 30 and 34 years, Male and Female Population Between 35 and 39 years, Male and Female Population Between 40 and 44 years, and 8 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the three variables, namely (a) Population (Male), (b) Population (Female), and (c) Gender Ratio (Males per 100 Females), we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau across 18 age groups, ranging from under 5 years to 85 years and above. These age groups are described above in the variables section. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of South Range by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for South Range. The dataset can be utilized to understand the population distribution of South Range by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in South Range. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for South Range.

    Key observations

    Largest age group (population): Male # 20-24 years (49) | Female # 20-24 years (50). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.

    Variables / Data Columns

    • Age Group: This column displays the age group for the South Range population analysis. Total expected values are 18 and are define above in the age groups section.
    • Population (Male): The male population in the South Range is shown in the following column.
    • Population (Female): The female population in the South Range is shown in the following column.
    • Gender Ratio: Also known as the sex ratio, this column displays the number of males per 100 females in South Range for each age group.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for South Range Population by Gender. You can refer the same here

  4. f

    Table of parameter descriptions and ranges of values used in the model.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dougall, Paul; Bailey, Susan; Greene, James; Darko, Frederick Laud Amoah; Annan, William; Dalton, Mackenzie; Asante-Asamani, Emmanuel; White, Diana (2022). Table of parameter descriptions and ranges of values used in the model. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000282026
    Explore at:
    Dataset updated
    Nov 9, 2022
    Authors
    Dougall, Paul; Bailey, Susan; Greene, James; Darko, Frederick Laud Amoah; Annan, William; Dalton, Mackenzie; Asante-Asamani, Emmanuel; White, Diana
    Description

    All parameters are assumed non-negative. S(0), , , I2(0), and R(0) define the initial population sizes. Dashes are used when values are arbitrarily chosen from some range.

  5. Elk Home Range - Potter-Redwood Valley - 2023-2024 [ds3191]

    • data-cdfw.opendata.arcgis.com
    • data.cnra.ca.gov
    • +4more
    Updated Sep 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2024). Elk Home Range - Potter-Redwood Valley - 2023-2024 [ds3191] [Dataset]. https://data-cdfw.opendata.arcgis.com/datasets/CDFW::elk-home-range-potter-redwood-valley-2023-2024-ds3191
    Explore at:
    Dataset updated
    Sep 18, 2024
    Dataset authored and provided by
    California Department of Fish and Wildlifehttps://wildlife.ca.gov/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The project lead for the collection of this data was Carrington Hilson. Elk (9 adult females) were captured and equipped with GPS collars (Lotek Iridium) transmitting data from 2023-2024. The Potter-Redwood Valley herd does not migrate between traditional summer and winter seasonal ranges. Therefore, annual home ranges were modeled using year-round data to demarcate high use areas in lieu of modeling the specific winter ranges commonly seen in other ungulate analyses in California. GPS locations were fixed at 6.5 hour intervals in the dataset. To improve the quality of the data set, all points with DOP values greater than 5 and those points visually assessed as a bad fix by the analyst were removed. The methodology used for this migration analysis allowed for the mapping of the herd's home range. Brownian bridge movement models (BBMMs; Sawyer et al. 2009) were constructed with GPS collar data from 8 elk, including 15 annual home range sequences, location, date, time, and average location error as inputs in Migration Mapper. BBMMs were produced at a spatial resolution of 50 m using a sequential fix interval of less than 27 hours and a fixed motion variance of 1000. Home range is visualized as the 50th percentile contour (high use) and the 99th percentile contour of the year-round utilization distribution. Home range designations for this herd may expand with a larger sample.

  6. original : CIFAR 100

    • kaggle.com
    zip
    Updated Dec 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shashwat Pandey (2024). original : CIFAR 100 [Dataset]. https://www.kaggle.com/datasets/shashwat90/original-cifar-100
    Explore at:
    zip(168517945 bytes)Available download formats
    Dataset updated
    Dec 28, 2024
    Authors
    Shashwat Pandey
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The CIFAR-10 and CIFAR-100 datasets are labeled subsets of the 80 million tiny images dataset. CIFAR-10 and CIFAR-100 were created by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. (Sadly, the 80 million tiny images dataset has been thrown into the memory hole by its authors. Spotting the doublethink which was used to justify its erasure is left as an exercise for the reader.)

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

    The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

    The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

    Baseline results You can find some baseline replicable results on this dataset on the project page for cuda-convnet. These results were obtained with a convolutional neural network. Briefly, they are 18% test error without data augmentation and 11% with. Additionally, Jasper Snoek has a new paper in which he used Bayesian hyperparameter optimization to find nice settings of the weight decay and other hyperparameters, which allowed him to obtain a test error rate of 15% (without data augmentation) using the architecture of the net that got 18%.

    Other results Rodrigo Benenson has collected results on CIFAR-10/100 and other datasets on his website; click here to view.

    Dataset layout Python / Matlab versions I will describe the layout of the Python version of the dataset. The layout of the Matlab version is identical.

    The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary: python def unpickle(file): import cPickle with open(file, 'rb') as fo: dict = cPickle.load(fo) return dict And a python3 version: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict Loaded in this way, each of the batch files contains a dictionary with the following elements: data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image. labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

    The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries: label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc. Binary version The binary version contains the files data_batch_1.bin, data_batch_2.bin, ..., data_batch_5.bin, as well as test_batch.bin. Each of these files is formatted as follows: <1 x label><3072 x pixel> ... <1 x label><3072 x pixel> In other words, the first byte is the label of the first image, which is a number in the range 0-9. The next 3072 bytes are the values of the pixels of the image. The first 1024 bytes are the red channel values, the next 1024 the green, and the final 1024 the blue. The values are stored in row-major order, so the first 32 bytes are the red channel values of the first row of the image.

    Each file contains 10000 such 3073-byte "rows" of images, although there is nothing delimiting the rows. Therefore each file should be exactly 30730000 bytes long.

    There is another file, called batches.meta.txt. This is an ASCII file that maps numeric labels in the range 0-9 to meaningful class names. It is merely a list of the 10 class names, one per row. The class name on row i corresponds to numeric label i.

    The CIFAR-100 dataset This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). Her...

  7. Z

    ANN development + final testing datasets

    • data.niaid.nih.gov
    • resodate.org
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Authors (2020). ANN development + final testing datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1445865
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Authors
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    File name definitions:

    '...v_50_175_250_300...' - dataset for velocity ranges [50, 175] + [250, 300] m/s

    '...v_175_250...' - dataset for velocity range [175, 250] m/s

    'ANNdevelop...' - used to perform 9 parametric sub-analyses where, in each one, many ANNs are developed (trained, validated and tested) and the one yielding the best results is selected

    'ANNtest...' - used to test the best ANN from each aforementioned parametric sub-analysis, aiming to find the best ANN model; this dataset includes the 'ANNdevelop...' counterpart

    Where to find the input (independent) and target (dependent) variable values for each dataset/excel ?

    input values in 'IN' sheet

    target values in 'TARGET' sheet

    Where to find the results from the best ANN model (for each target/output variable and each velocity range)?

    open the corresponding excel file and the expected (target) vs ANN (output) results are written in 'TARGET vs OUTPUT' sheet

    Check reference below (to be added when the paper is published)

    https://www.researchgate.net/publication/328849817_11_Neural_Networks_-_Max_Disp_-_Railway_Beams

  8. d

    Digital data sets that describe aquifer characteristics of the alluvial and...

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Digital data sets that describe aquifer characteristics of the alluvial and terrace deposits along the Beaver-North Canadian River from the panhandle to Canton Lake in northwestern Oklahoma [Dataset]. https://catalog.data.gov/dataset/digital-data-sets-that-describe-aquifer-characteristics-of-the-alluvial-and-terrace-deposi-bf6a4
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Canton Lake, Canadian River, Oklahoma, North Canadian River
    Description

    This data set consists of digital hydraulic conductivity values for the alluvial and terrace deposits along the Beaver-North Canadian River from the panhandle to Canton Lake in northwestern Oklahoma. Ground water in 830 square miles of the Quaternary-age alluvial and terrace aquifer is an important source of water for irrigation, industrial, municipal, stock, and domestic supplies. The aquifer consists of poorly sorted, fine to coarse, unconsolidated quartz sand with minor amounts of clay, silt, and basal gravel. The hydraulically connected alluvial and terrace deposits unconformably overlie the Tertiary-age Ogallala Formation and Permian-age formations. Six zones of ranges of hydraulic conductivity values for the alluvial and terrace deposits reported in a ground-water modeling report are used in this data set. The hydraulic conductivity values range from 0 to 160 feet per day, and average 59 feet per day. The features in the data set representing aquifer boundaries along geological contacts were extracted from a published digital surficial geology data set based on a scale of 1:250,000. The geographic limits of the aquifer and zones representing ranges of hydraulic conductivity values were digitized from folded paper maps, at a scale of 1:250,000 from a ground-water modeling report. Ground-water flow models are numerical representations that simplify and aggregate natural systems. Models are not unique; different combinations of aquifer characteristics may produce similar results. Therefore, values of hydraulic conductivity used in the model and presented in this data set are not precise, but are within a reasonable range when compared to independently collected data.

  9. House Price Prediction Dataset

    • kaggle.com
    zip
    Updated Sep 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zafar (2024). House Price Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/zafarali27/house-price-prediction-dataset
    Explore at:
    zip(29372 bytes)Available download formats
    Dataset updated
    Sep 21, 2024
    Authors
    Zafar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    House Price Prediction Dataset.

    The dataset contains 2000 rows of house-related data, representing various features that could influence house prices. Below, we discuss key aspects of the dataset, which include its structure, the choice of features, and potential use cases for analysis.

    1. Dataset Features

    The dataset is designed to capture essential attributes for predicting house prices, including:

    Area: Square footage of the house, which is generally one of the most important predictors of price. Bedrooms & Bathrooms: The number of rooms in a house significantly affects its value. Homes with more rooms tend to be priced higher. Floors: The number of floors in a house could indicate a larger, more luxurious home, potentially raising its price. Year Built: The age of the house can affect its condition and value. Newly built houses are generally more expensive than older ones. Location: Houses in desirable locations such as downtown or urban areas tend to be priced higher than those in suburban or rural areas. Condition: The current condition of the house is critical, as well-maintained houses (in 'Excellent' or 'Good' condition) will attract higher prices compared to houses in 'Fair' or 'Poor' condition. Garage: Availability of a garage can increase the price due to added convenience and space. Price: The target variable, representing the sale price of the house, used to train machine learning models to predict house prices based on the other features.

    2. Feature Distributions

    Area Distribution: The area of the houses in the dataset ranges from 500 to 5000 square feet, which allows analysis across different types of homes, from smaller apartments to larger luxury houses. Bedrooms and Bathrooms: The number of bedrooms varies from 1 to 5, and bathrooms from 1 to 4. This variance enables analysis of homes with different sizes and layouts. Floors: Houses in the dataset have between 1 and 3 floors. This feature could be useful for identifying the influence of multi-level homes on house prices. Year Built: The dataset contains houses built from 1900 to 2023, giving a wide range of house ages to analyze the effects of new vs. older construction. Location: There is a mix of urban, suburban, downtown, and rural locations. Urban and downtown homes may command higher prices due to proximity to amenities. Condition: Houses are labeled as 'Excellent', 'Good', 'Fair', or 'Poor'. This feature helps model the price differences based on the current state of the house. Price Distribution: Prices range between $50,000 and $1,000,000, offering a broad spectrum of property values. This range makes the dataset appropriate for predicting a wide variety of housing prices, from affordable homes to luxury properties.

    3. Correlation Between Features

    A key area of interest is the relationship between various features and house price: Area and Price: Typically, a strong positive correlation is expected between the size of the house (Area) and its price. Larger homes are likely to be more expensive. Location and Price: Location is another major factor. Houses in urban or downtown areas may show a higher price on average compared to suburban and rural locations. Condition and Price: The condition of the house should show a positive correlation with price. Houses in better condition should be priced higher, as they require less maintenance and repair. Year Built and Price: Newer houses might command a higher price due to better construction standards, modern amenities, and less wear-and-tear, but some older homes in good condition may retain historical value. Garage and Price: A house with a garage may be more expensive than one without, as it provides extra storage or parking space.

    4. Potential Use Cases

    The dataset is well-suited for various machine learning and data analysis applications, including:

    House Price Prediction: Using regression techniques, this dataset can be used to build a model to predict house prices based on the available features. Feature Importance Analysis: By using techniques such as feature importance ranking, data scientists can determine which features (e.g., location, area, or condition) have the greatest impact on house prices. Clustering: Clustering techniques like k-means could help identify patterns in the data, such as grouping houses into segments based on their characteristics (e.g., luxury homes, affordable homes). Market Segmentation: The dataset can be used to perform segmentation by location, price range, or house type to analyze trends in specific sub-markets, like luxury vs. affordable housing. Time-Based Analysis: By studying how house prices vary with the year built or the age of the house, analysts can derive insights into the trends of older vs. newer homes.

    5. Limitations and ...

  10. housing

    • kaggle.com
    zip
    Updated Sep 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HappyRautela (2023). housing [Dataset]. https://www.kaggle.com/datasets/happyrautela/housing
    Explore at:
    zip(809785 bytes)Available download formats
    Dataset updated
    Sep 22, 2023
    Authors
    HappyRautela
    Description

    The exercise after this contains questions that are based on the housing dataset.

    1. How many houses have a waterfront? a. 21000 b. 21450 c. 163 d. 173

    2. How many houses have 2 floors? a. 2692 b. 8241 c. 10680 d. 161

    3. How many houses built before 1960 have a waterfront? a. 80 b. 7309 c. 90 d. 92

    4. What is the price of the most expensive house having more than 4 bathrooms? a. 7700000 b. 187000 c. 290000 d. 399000

    5. For instance, if the ‘price’ column consists of outliers, how can you make the data clean and remove the redundancies? a. Calculate the IQR range and drop the values outside the range. b. Calculate the p-value and remove the values less than 0.05. c. Calculate the correlation coefficient of the price column and remove the values less than the correlation coefficient. d. Calculate the Z-score of the price column and remove the values less than the z-score.

    6. What are the various parameters that can be used to determine the dependent variables in the housing data to determine the price of the house? a. Correlation coefficients b. Z-score c. IQR Range d. Range of the Features

    7. If we get the r2 score as 0.38, what inferences can we make about the model and its efficiency? a. The model is 38% accurate, and shows poor efficiency. b. The model is showing 0.38% discrepancies in the outcomes. c. Low difference between observed and fitted values. d. High difference between observed and fitted values.

    8. If the metrics show that the p-value for the grade column is 0.092, what all inferences can we make about the grade column? a. Significant in presence of other variables. b. Highly significant in presence of other variables c. insignificance in presence of other variables d. None of the above

    9. If the Variance Inflation Factor value for a feature is considerably higher than the other features, what can we say about that column/feature? a. High multicollinearity b. Low multicollinearity c. Both A and B d. None of the above

  11. Landmarks Dataset for sign recognition numbers

    • kaggle.com
    zip
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshat Mittu (2022). Landmarks Dataset for sign recognition numbers [Dataset]. https://www.kaggle.com/datasets/akshatmittu/landmarks-dataset-for-sign-recognition-numbers
    Explore at:
    zip(50385 bytes)Available download formats
    Dataset updated
    Nov 4, 2022
    Authors
    Akshat Mittu
    Description

    This dataset was create using hand signs in images and made the landmarks of the same were made into the attributes of the dataset, contains all 21 landmarks of with each coordinate(x,y,z) and 5 classes(1,2,3,4,5).

    You can also add more classes to your dataset by running the following code, make sure to create an empty dataset or append to the dataset here and set the file path correctly

    import numpy as np import pandas as pd import matplotlib.pyplot as plt import mediapipe as mp import cv2 import os

    for t in range(1,6): path = 'data/'+str(t)+'/' images = os.listdir(path) for i in images: image = cv2.imread(path+i) mp_hands = mp.solutions.hands hands = mp_hands.Hands(static_image_mode=False,max_num_hands=1,min_detection_confidence=0.8,min_tracking_confidence=0.8) mp_draw = mp.solutions.drawing_utils image = cv2.cvtColor(image,cv2.COLOR_BGR2RGB) image.flags.writeable=False results = hands.process(image) image.flags.writeable=True ``` if results.multi_hand_landmarks:

        for hand_no, hand_landmarks in enumerate(results.multi_hand_landmarks):
    
          mp_draw.draw_landmarks(image = image, landmark_list = hand_landmarks,
                   connections = mp_hands.HAND_CONNECTIONS)
      a = dict()
      a['label'] = t
      for i in range(21):
        s = ('x','y','z')
        k = (hand_landmarks.landmark[i].x,hand_landmarks.landmark[i].y,hand_landmarks.landmark[i].z)
        for j in range(len(k)):
          a[str(mp_hands.HandLandmark(i).name)+'_'+str(s[j])] = k[j]
      df = df.append(a,ignore_index=True)
    
  12. e

    Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6

    • data.europa.eu
    • ckan.publishing.service.gov.uk
    • +2more
    html, sparql
    Updated Oct 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Housing, Communities and Local Government (2021). Subjective wellbeing, 'Worthwhile', percentage of responses in range 0-6 [Dataset]. https://data.europa.eu/data/datasets/subjective-wellbeing-worthwhile-percentage-of-responses-in-range-0-6
    Explore at:
    html, sparqlAvailable download formats
    Dataset updated
    Oct 11, 2021
    Dataset authored and provided by
    Ministry of Housing, Communities and Local Government
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Percentage of responses in range 0-6 out of 10 (corresponding to 'low wellbeing') for 'Worthwhile' in the First ONS Annual Experimental Subjective Wellbeing survey.

    The Office for National Statistics has included the four subjective well-being questions below on the Annual Population Survey (APS), the largest of their household surveys.

    • Overall, how satisfied are you with your life nowadays?
    • Overall, to what extent do you feel the things you do in your life are worthwhile?
    • Overall, how happy did you feel yesterday?
    • Overall, how anxious did you feel yesterday?

    This dataset presents results from the second of these questions, "Overall, to what extent do you feel the things you do in your life are worthwhile?" Respondents answer these questions on an 11 point scale from 0 to 10 where 0 is ‘not at all’ and 10 is ‘completely’. The well-being questions were asked of adults aged 16 and older.

    Well-being estimates for each unitary authority or county are derived using data from those respondents who live in that place. Responses are weighted to the estimated population of adults (aged 16 and older) as at end of September 2011.

    The data cabinet also makes available the proportion of people in each county and unitary authority that answer with ‘low wellbeing’ values. For the ‘worthwhile’ question answers in the range 0-6 are taken to be low wellbeing.

    This dataset contains the percentage of responses in the range 0-6. It also contains the standard error, the sample size and lower and upper confidence limits at the 95% level.

    The ONS survey covers the whole of the UK, but this dataset only includes results for counties and unitary authorities in England, for consistency with other statistics available at this website.

    At this stage the estimates are considered ‘experimental statistics’, published at an early stage to involve users in their development and to allow feedback. Feedback can be provided to the ONS via this email address.

    The APS is a continuous household survey administered by the Office for National Statistics. It covers the UK, with the chief aim of providing between-census estimates of key social and labour market variables at a local area level. Apart from employment and unemployment, the topics covered in the survey include housing, ethnicity, religion, health and education. When a household is surveyed all adults (aged 16+) are asked the four subjective well-being questions.

    The 12 month Subjective Well-being APS dataset is a sub-set of the general APS as the well-being questions are only asked of persons aged 16 and above, who gave a personal interview and proxy answers are not accepted. This reduces the size of the achieved sample to approximately 120,000 adult respondents in England.

    The original data is available from the ONS website.

    Detailed information on the APS and the Subjective Wellbeing dataset is available here.

    As well as collecting data on well-being, the Office for National Statistics has published widely on the topic of wellbeing. Papers and further information can be found here.

  13. g

    Simple download service (Atom) of the dataset: Areas in which the water rise...

    • gimi9.com
    Updated Jan 27, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Simple download service (Atom) of the dataset: Areas in which the water rise is within a given range of values | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_fr-120066022-srv-9e5170c6-6830-4c52-a91d-e869724c4927
    Explore at:
    Dataset updated
    Jan 27, 2022
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Tables of areas in which a hazard of a certain type under a certain scenario causes a rise of water whose height is within a fixed range of values. Spatial data set produced by the GIS High Flood Risk Land Flood Directive (TRI) of... and mapped for reporting purposes for the European Flood Directive. European Directive 2007/60/EC of 23 October 2007 on the assessment and management of flood risks (OJ L 288, 06-11-2007, p. 27) influences the flood prevention strategy in Europe. It requires the production of flood risk management plans to reduce the negative consequences of flooding on human health, the environment, cultural heritage and economic activity. The objectives and implementation requirements are set out in the Law of 12 July 2010 on the National Commitment for the Environment (LENE) and the Decree of 2 March 2011. In this context, the primary objective of flood and flood risk mapping for IRRs is to contribute, by homogenising and objectivating knowledge of flood exposure, to the development of flood risk management plans (WRMs). This dataset is used to produce flood surface maps and flood risk maps that represent flood hazards and issues at an appropriate scale, respectively. Their objective is to provide quantitative evidence to further assess the vulnerability of a territory for the three levels of probability of flooding (high, medium, low).

  14. Cancer Data

    • kaggle.com
    Updated Mar 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erdem Taha (2023). Cancer Data [Dataset]. https://www.kaggle.com/datasets/erdemtaha/cancer-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 22, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Erdem Taha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    🦠 Breast Cancer Data Set

    This dataset contains the characteristics of patients diagnosed with cancer. The dataset contains a unique ID for each patient, the type of cancer (diagnosis), the visual characteristics of the cancer and the average values of these characteristics.

    📚 The main features of the dataset are as follows:

    1. id: Represents a unique ID of each patient.
    2. diagnosis: Indicates the type of cancer. This property can take the values "M" (Malignant - Benign) or "B" (Benign - Malignant).
    3. radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean: Represents the mean values of the cancer's visual characteristics.

    There are also several categorical features where patients in the dataset are labeled with numerical values. You can examine them in the Chart area.

    Other features contain specific ranges of average values of the features of the cancer image:

    • radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, concave points_mean

    Each of these features is mapped to a table containing the number of values in a given range. You can examine the Chart Tables

    Each sample contains the patient's unique ID, the cancer diagnosis and the average values of the cancer's visual characteristics.

    Such a dataset can be used to train or test models and algorithms used to make cancer diagnoses. Understanding and analyzing the dataset can contribute to the improvement of cancer-related visual features and diagnosis.

    ✨ Examples of Projects that can be done with the Data Set

    Logistic Regression: This algorithm can be used effectively for binary classification problems. In this dataset, logistic regression may be an appropriate choice since there are "Malignant" (benign) and "Benign" (malignant) classes. It can be used to predict cancer type with the visual features in the dataset.

    K-Nearest Neighbors (KNN): KNN classifies an example by looking at the k closest examples around it. This algorithm assumes that patients with similar characteristics tend to have similar types of cancer. KNN can be used for cancer diagnosis by taking into account neighborhood relationships in the data set.

    Support Vector Machines (SVM): SVM is effective for classification tasks, especially for two-class problems. Focusing on the clear separation of classes in the dataset, SVM is a powerful algorithm that can be used for cancer diagnosis.

    Data Set Related Training Notebooks 😊 ("I Recommend You Review")

    K-NN Project: https://www.kaggle.com/code/erdemtaha/prediction-cancer-data-with-k-nn-95

    Logistic Regressüon: https://www.kaggle.com/code/erdemtaha/cancer-prediction-96-5-with-logistic-regression

    💖 Acknowledgements and Information

    This is a copy of content that has been elaborated for educational purposes and published to reach more people, you can access the original source from the link below, please do not forget to support that data

    🔗 https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data

    This database can also be accessed via the UW CS ftp server: 🔗 ftp.cs.wisc.edu cd math-prog/cpo-dataset/machine-learn/WDBC/

    It can also be found at the UCI Machine Learning Repository: 🔗 https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29

    📩 Personal Information:

    If you have some questions or curiosities about the data or studies, you can contact me as you wish from the links below 😊

    LinkedIn: https://www.linkedin.com/in/erdem-taha-sokullu/

    Mail: erdemtahasokullu@gmail.com

    Github: https://github.com/Prometheussx

    Kaggle: https://www.kaggle.com/erdemtaha

    📜 License:

    This Data has a CC BY-NC-SA 4.0 License You can review the license rules from the link below

    License Link: https://creativecommons.org/licenses/by-nc-sa/4.0/

  15. Marathi and Maharashtrian Ornaments Dataset

    • kaggle.com
    zip
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tushar Kute (2025). Marathi and Maharashtrian Ornaments Dataset [Dataset]. https://www.kaggle.com/datasets/tusharkute/marathi-and-maharashtrian-ornamants-dataset/code
    Explore at:
    zip(8971 bytes)Available download formats
    Dataset updated
    Jul 29, 2025
    Authors
    Tushar Kute
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset comprises 953 synthetically generated entries detailing various traditional Marathi ornaments. It is designed to provide a structured collection of common features associated with these unique pieces of jewelry, often worn in Maharashtra, India.

    Purpose: The primary purpose of this dataset is to serve as a foundational resource for:

    Educational Projects: Students and enthusiasts can use it to learn about data handling, analysis, and visualization.
    
    Machine Learning Exploration: Researchers can explore classification or regression tasks, for instance, predicting the type of ornament based on its physical properties or vice-versa.
    
    Jewelry Domain Studies: Individuals interested in traditional Indian jewelry can gain insights into the typical characteristics of these ornaments.
    
    Data Generation Practice: It can serve as an example for understanding how synthetic datasets can be created for specific domains.
    

    Content & Generation: The dataset was created programmatically by defining plausible ranges and distributions for each feature based on general knowledge of these ornaments. While synthetic, the values aim to reflect realistic characteristics for each ornament type, acknowledging that actual jewelry pieces will have unique variations. For example:

    Weight, Length/Height, Width: Ranges were set to represent typical sizes and weights.
    Number of Components/Units & Stones/Pearls: These features vary significantly based on the ornament's intricate design, from single-unit pieces like 'Nath' to multi-component necklaces like 'Thushi' or 'Mohan Mala'.
    Carat Weight of Stones: Applied only to ornaments that typically feature stones or pearls.
    Gold Purity: Reflects common gold purities used in Indian jewelry (e.g., 20K, 21K, 22K, 23K, 24K). Silver purity (e.g., 80-95%) is assigned for 'Jodvi'.
    Primary Material: Predominantly 'Gold' for most ornaments, with 'Silver' for 'Jodvi'.
    

    This dataset offers a starting point for analyses where real-world data might be scarce or difficult to collect.

    File Information

    File Name: marathi_ornaments_dataset.csv
    Number of Rows: 953
    Number of Columns: 8
    Approximate File Size: ~60 KB (will vary slightly based on exact content and line endings)
    

    Column Descriptor

    Here's a detailed description for each column in the marathi_ornaments_dataset.csv file:

    Ornament Class
    
      Description: The traditional Marathi name of the jewelry item. This is the categorical target variable representing different types of ornaments.
    
      Data Type: String (Categorical)
    
      Possible Values: Nath, Thushi, Kolhapuri Saaj, Mohan Mala, Laxmi Haar, Tanmani, Chinchpeti, Bakuli Haar, Surya Haar, Bugadi, Kudya, Bajuband, Tode, Patlya, Mangalsutra, Jodvi, Kambarpatta
    
    Weight (grams)
    
      Description: The approximate weight of the ornament in grams.
      Data Type: Float
      Units: grams (g)
      Range: Varies significantly by ornament type (e.g., Nath would be lighter, Laxmi Haar or Kambarpatta would be heavier).
    
    Length/Height (cm)
    
      Description: The approximate length (for necklaces, bracelets) or height (for earrings, nose rings) of the ornament in centimeters.
      Data Type: Float
      Units: centimeters (cm)
      Range: Varies by ornament type.
    
    Width (cm)
    
      Description: The approximate width of the ornament in centimeters.
      Data Type: Float
      Units: centimeters (cm)
      Range: Varies by ornament type and design.
    
    Number of Components/Units
    
      Description: The total count of distinct, often repeated, design elements or units that make up the ornament. For intricate necklaces, this can be high.
      Data Type: Integer
      Range: 1 to ~1000 (especially for fine 'Thushi' beads).
    Number of Stones/Pearls
    
      Description: The count of stones (e.g., diamonds, rubies, emeralds) or pearls embedded in or attached to the ornament.
      Data Type: Integer
      Range: 0 to ~50 (many traditional designs have no stones, some have many).
    
    Carat Weight of Stones
    
      Description: The total approximate carat weight of all stones present in the ornament. This value is 0.0 if Number of Stones/Pearls is 0.
      Data Type: Float
      Units: Carats (ct)
      Range: 0.0 to ~1.0 (or higher for very elaborate pieces).
    
    Gold Purity (Karat)
    
      Description: The purity of the primary gold material used, expressed in Karats. For 'Jodvi', which are traditionally silver, this represents silver purity as a percentage (even though labeled 'Gold Purity (Karat)' for consistency in column headers).
    
      Data Type: Integer
      Units: Karat (K) for gold, Percentage (%) for silver (for Jodvi).
      Possible Values: 20, 21, 22, 23, 24 for Gold. 80 to 95 for Silver (specifically for Jodvi).
    
    Primary Material
    
      Des...
    
  16. d

    Habitat cores used in primary model ( ≥1500 hectares) and supplemental...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Oct 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Fish and Wildlife Service (2025). Habitat cores used in primary model ( ≥1500 hectares) and supplemental habitat cores (between 300 - 1500 hectares) - A landscape connectivity analysis for the coastal marten (Martes caurina humboldtensis) [Dataset]. https://catalog.data.gov/dataset/habitat-cores-used-in-primary-model-1500-hectares-and-supplemental-habitat-cores-between-3
    Explore at:
    Dataset updated
    Oct 5, 2025
    Dataset provided by
    U.S. Fish and Wildlife Service
    Description

    This dataset contains additional "small" habitat cores that had a minimum size of 1 female marten home range (300ha), but were too small to meet the minimum size threshold of 5 female home ranges (1500ha) used to define cores in the Primary Model. This dataset also contains the habitat cores from the Primary Model (i.e. cores ≥1500ha). The description following this paragraph is adapted from the the metadata description for developing cores in the Primary Model. These methods are identical to those used in developing cores in the Primary Model, with one exception: The minimum habitat core size parameter used in the Core Mapper tool was set to 300ha instead of 1500ha. It should be noted that a single core in this dataset actually slightly exceeded the 1500ha threshold for its final area calculation but was not present in the Primary Model set of habitat cores. We determined that this was because the "1500ha cutoff" in the tool was actually applied before the core was expanded by 977m to fill in interior holes and then subsequently trimmed back (In the Core Mapper tool, this is controlled by the "Expand cores by this CWD value" and "Trim back expanded cores" parameters). We derived the habitat cores using a tool within Gnarly Landscape Utilities called Core Mapper (Shirk and McRae 2015). To develop a Habitat Surface for input into Core Mapper, we started by assigning each 30m pixel on the modeled landscape a habitat value equal to its GNN OGSI (range = 0-100). In areas with serpentine soils that support habitat potentially suitable for coastal marten (see report for details), we assigned a minimum habitat value of 31, which is equivalent to the 33rd percentile of OGSI 80 pixels in the marten’s historical range. Pixels with higher OGSI retained their normal habitat value. Our intention was to allow the modified serpentine pixels to be more easily incorporated into habitat cores if there were higher value OGSI pixels in the vicinity, but not to have them form the entire basis of a core. We also excluded pixels with a habitat value <1.0 from inclusion in habitat cores. We then used a moving window to calculate the average habitat value within a 977m radius around each pixel (derived from the estimated average size of a female marten’s home range of 300 ha). Pixels with an average habitat value ≥36.0 were then incorporated into habitat cores. After conducting a sensitivity analysis by running a set of Core Mapper trials using a broad range of habitat values, we chose ≥36.0 as the average habitat value because it is the median OGSI of pixels within the marten’s historical range classified by the GNN as “OGSI 80” (Davis et al. 2015). It generated a set of habitat cores that were not overly generous (depicting most of the landscape as habitat core) or strict (only mapping cores in a few locations with very high OGSI such as Redwood State and National Parks) (see Appendix 3 of the referenced report for more details, including example maps from our sensitivity analysis). We then set Core Mapper to expand the habitat cores by 977 cost-weighted meters, a step intended to consolidate smaller cores that were probably relatively close together from a marten’s perspective. This was followed by a “trimming” step that removed pixels from the expansion that did not meet the moving window average so the net result was rather small changes in the size of the habitat cores, but filling in many individual isolated pixels with a habitat value of 0. This is an abbreviated and incomplete description of the dataset. Please refer to the spatial metadata for a more thorough description of the methods used to produce this dataset, and a discussion of any assumptions or caveats that should be taken into consideration.

  17. GRACEnet: GHG Emissions, C Sequestration and

    • kaggle.com
    zip
    Updated Jan 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). GRACEnet: GHG Emissions, C Sequestration and [Dataset]. https://www.kaggle.com/datasets/thedevastator/gracenet-ghg-emissions-c-sequestration-and-envir
    Explore at:
    zip(1943875 bytes)Available download formats
    Dataset updated
    Jan 19, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    GRACEnet: GHG Emissions, C Sequestration and Environmental Benefits

    Quantifying Climate Change Mitigation and Sustainable Agricultural Practices

    By US Open Data Portal, data.gov [source]

    About this dataset

    This Kaggle dataset showcases the groundbreaking research undertaken by the GRACEnet program, which is attempting to better understand and minimize greenhouse gas (GHG) emissions from agro-ecosystems in order to create a healthier world for all. Through multi-location field studies that utilize standardized protocols – combined with models, producers, and policy makers – GRACEnet seeks to: typify existing production practices, maximize C sequestration, minimize net GHG emissions, and meet sustainable production goals. This Kaggle dataset allows us to evaluate the impact of different management systems on factors such as carbon dioxide and nitrous oxide emissions, C sequestration levels, crop/forest yield levels – plus additional environmental effects like air quality etc. With this data we can start getting an idea of the ways that agricultural policies may be influencing our planet's ever-evolving climate dilemma

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Step 1: Familiarize yourself with the columns in this dataset. In particular, pay attention to Spreadsheet tab description (brief description of each spreadsheet tab), Element or value display name (name of each element or value being measured), Description (detailed description), Data type (type of data being measured) Unit (unit of measurement for the data) Calculation (calculation used to determine a value or percentage) Format (format required for submitting values), Low Value and High Value (range for acceptable entries).

    Step 2: Familiarize yourself with any additional information related to calculations. Most calculations made use of accepted best estimates based on standard protocols defined by GRACEnet. Every calculation was described in detail and included post-processing steps such as quality assurance/quality control changes as well as measurement uncertainty assessment etc., as available sources permit relevant calculations were discussed collaboratively between all participating partners at every level where they felt necessary. All terms were rigorously reviewed before all partners agreed upon any decision(s). A range was established when several assumptions were needed or when there was a high possibility that samples might fall outside previously accepted ranges associated with standard protocol conditions set up at GRACEnet Headquarters laboratories resulting due to other external factors like soil type, climate etc,.

    Step 3: Determine what types of operations are allowed within each spreadsheet tab (.csv file). For example on some tabs operations like adding an entire row may be permitted but using formulas is not permitted since all non-standard manipulations often introduce errors into an analysis which is why users are encouraged only add new rows/columns provided it is seen fit for their specific analysis operations like fill blank cells by zeros or delete rows/columns made redundant after standard filtering process which have been removed earlier from different tabs should be avoided since these nonstandard changes create unverified extra noise which can bias your results later on during robustness testing processes related to self verification process thereby creating erroneous output results also such action also might result into additional FET values due API's specially crafted excel documents while selecting two ways combo box therefore

    Research Ideas

    • Analyzing and comparing the environmental benefits of different agricultural management practices, such as crop yields and carbon sequestration rates.
    • Developing an app or other mobile platform to help farmers find management practices that maximize carbon sequestration and minimize GHG emissions in their area, based on their specific soil condition and climate data.
    • Building an AI-driven model to predict net greenhouse gas emissions and C sequestration from potential weekly/monthly production plans across different regions in the world, based on optimal allocation of resources such as fertilizers, equipment, water etc

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the ...

  18. 2M unique spotify songs with audio features.

    • kaggle.com
    zip
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    krish sharma (2024). 2M unique spotify songs with audio features. [Dataset]. https://www.kaggle.com/datasets/krishsharma0413/2-million-songs-from-mpd-with-audio-features
    Explore at:
    zip(408512929 bytes)Available download formats
    Dataset updated
    Sep 1, 2024
    Authors
    krish sharma
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Where is the data from?

    The dataset is a combination of Million Playlist Dataset and Spotify API.

    SQLite structure.

    The SQLite is in .db format. With one table extracted. Following are all the columns in this table. - track_uri (TEXT PRIMARY KEY): Unique identifier used by Spotify for songs. - track_name (TEXT): Song name. - artist_name (TEXT): Artist name. - artist_uri (TEXT): Unique identifier used by Spotify for artists. - album_name (TEXT): Album name - album_uri (TEXT): Unique identifier used by Spotify for albums. - duration_ms (INTEGER): Duration of the song. - danceability (REAL): Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable. - energy (REAL): Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy. - key (INTEGER): The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1. - loudness (REAL): The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 db. - mode (INTEGER): Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived. Major is represented by 1 and minor is 0. - speechiness (REAL): Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks. - acousticness (REAL): A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic. - instrumentalness (REAL): Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0. - liveness (REAL): Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live. - valence (REAL): A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry). - tempo (REAL): The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration. - type (TEXT): The object type. - id (TEXT): The Spotify ID for the track. - uri (TEXT): The Spotify URI for the track. - track_href (TEXT): A link to the Web API endpoint providing full details of the track. - analysis_url (TEXT): A URL to access the full audio analysis of this track. An access token is required to access this data. - fduration_ms (INTEGER): The duration of the track in milliseconds. - time_signature (INTEGER): An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4", to "7/4".

  19. m

    Bridging the Gap in Hypertension Management: Evaluating Blood Pressure...

    • data.mendeley.com
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    abu sufian (2025). Bridging the Gap in Hypertension Management: Evaluating Blood Pressure Control and Associated Risk Factors in a Resource-Constrained Setting [Dataset]. http://doi.org/10.17632/56jyjndvcr.1
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    abu sufian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description

    This dataset contains a simulated collection of 1,00000 patient records designed to explore hypertension management in resource-constrained settings. It provides comprehensive data for analyzing blood pressure control rates, associated risk factors, and complications. The dataset is ideal for predictive modelling, risk analysis, and treatment optimization, offering insights into demographic, clinical, and treatment-related variables.

    Dataset Structure

    1. Dataset Volume

      • Size: 10,000 records. • Features: 19 variables, categorized into Sociodemographic, Clinical, Complications, and Treatment/Control groups.

    2. Variables and Categories

    A. Sociodemographic Variables

    1. Age:
    •  Continuous variable in years.
    •  Range: 18–80 years.
    •  Mean ± SD: 49.37 ± 12.81.
    2. Sex:
    •  Categorical variable.
    •  Values: Male, Female.
    3. Education:
    •  Categorical variable.
    •  Values: No Education, Primary, Secondary, Higher Secondary, Graduate, Post-Graduate, Madrasa.
    4. Occupation:
    •  Categorical variable.
    •  Values: Service, Business, Agriculture, Retired, Unemployed, Housewife.
    5. Monthly Income:
    •  Categorical variable in Bangladeshi Taka.
    •  Values: <5000, 5001–10000, 10001–15000, >15000.
    6. Residence:
    •  Categorical variable.
    •  Values: Urban, Sub-urban, Rural.
    

    B. Clinical Variables

    7. Systolic BP:
    •  Continuous variable in mmHg.
    •  Range: 100–200 mmHg.
    •  Mean ± SD: 140 ± 15 mmHg.
    8. Diastolic BP:
    •  Continuous variable in mmHg.
    •  Range: 60–120 mmHg.
    •  Mean ± SD: 90 ± 10 mmHg.
    9. Elevated Creatinine:
    •  Binary variable (\geq 1.4 \, \text{mg/dL}).
    •  Values: Yes, No.
    10. Diabetes Mellitus:
    •  Binary variable.
    •  Values: Yes, No.
    11. Family History of CVD:
    •  Binary variable.
    •  Values: Yes, No.
    12. Elevated Cholesterol:
    •  Binary variable (\geq 200 \, \text{mg/dL}).
    •  Values: Yes, No.
    13. Smoking:
    •  Binary variable.
    •  Values: Yes, No.
    

    C. Complications

    14. LVH (Left Ventricular Hypertrophy):
    •  Binary variable (ECG diagnosis).
    •  Values: Yes, No.
    15. IHD (Ischemic Heart Disease):
    •  Binary variable.
    •  Values: Yes, No.
    16. CVD (Cerebrovascular Disease):
    •  Binary variable.
    •  Values: Yes, No.
    17. Retinopathy:
    •  Binary variable.
    •  Values: Yes, No.
    

    D. Treatment and Control

    18. Treatment:
    •  Categorical variable indicating therapy type.
    •  Values: Single Drug, Combination Drugs.
    19. Control Status:
    •  Binary variable.
    •  Values: Controlled, Uncontrolled.
    

    Dataset Applications

    1. Predictive Modeling:
    •  Develop models to predict blood pressure control status using demographic and clinical data.
    2. Risk Analysis:
    •  Identify significant factors influencing hypertension control and complications.
    3. Severity Scoring:
    •  Quantify hypertension severity for patient risk stratification.
    4. Complications Prediction:
    •  Forecast complications like IHD, LVH, and CVD for early intervention.
    5. Treatment Guidance:
    •  Analyze therapy efficacy to recommend optimal treatment strategies.
    
  20. Car Wash Performance Statistics

    • kaggle.com
    zip
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    black_mamba2024 (2024). Car Wash Performance Statistics [Dataset]. https://www.kaggle.com/datasets/blackmamba2024/car-wash-statistics
    Explore at:
    zip(14006 bytes)Available download formats
    Dataset updated
    Jul 4, 2024
    Authors
    black_mamba2024
    Description

    The dataset contains information about car wash customers. It has 1000 rows, each representing a different customer, and five features that describe various aspects of their car wash habits and preferences. Here are the features in detail:

    Frequency_of_Washes:

    Type: Integer Description: This column indicates how often a customer gets their car washed in a month. The values range from 1 to 11 washes per month. Example Values: 4, 2, 8 Spending_per_Visit:

    Type: Float Description: This column represents the amount of money a customer spends on each car wash visit. The values are in dollars and range from $10 to $50. Example Values: 30.5, 15.75, 40.2 Preferred_Service_Type:

    Type: Categorical (String) Description: This column indicates the type of car wash service the customer prefers. The possible values are "Basic," "Premium," and "Detailing." Example Values: Premium, Basic, Detailing Vehicle_Age:

    Type: Integer Description: This column shows the age of the customer's vehicle in years. The values range from 0 to 20 years. Example Values: 3, 10, 1 Customer_Loyalty:

    Type: Categorical (String) Description: This column indicates the loyalty level of the customer. The possible values are "Low," "Medium," and "High." Example Values: High, Medium, Low

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Menu, Frédéric; Rascalou, Guilhem; Gourbière, Sébastien; Pontier, Dominique (2012). Parameters definition and range of values. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001141670

Parameters definition and range of values.

Explore at:
Dataset updated
May 18, 2012
Authors
Menu, Frédéric; Rascalou, Guilhem; Gourbière, Sébastien; Pontier, Dominique
Description

(1)Vector, human and non-human hosts natural death rates were estimated as 1/individual longevity. The range of variation of longevity (i.e. 1/death rate parameter defined in the model), as those are the raw data found in the literature (see sections ‘Vector local growth rate’ and ‘Human and non-human hosts natural death rates’ in Text S1).(2)Death rates were calculated as the sum of the natural death rate of human or non-human hosts and additional mortality imposed by the pathogen to infectious and ‘recovered’ individuals (as calculated in section ‘Human and non-human hosts mortality induced by the pathogen’ in Text S1).

Search
Clear search
Close search
Google apps
Main menu