6 datasets found
  1. g

    Census of selected service industries, 1972 summary statistic file SA

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census; United States (2020). Census of selected service industries, 1972 summary statistic file SA [Dataset]. https://datasearch.gesis.org/dataset/httpsdataverse.unc.eduoai--hdl1902.29C-7
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    U.S. Bureau of the Census; United States
    Description

    The subject matter in the five individual files which comprise the total data package is similar. SA1 presents detailed kind-of- business statistics (two-, three-, and four-digit industry levels) on number of establishments and receipts (total and with payroll), number of proprietorships and partnerships, annual and first quarter payroll, and number of paid employees. SA2 contains the same data items as above for selected services total, in addition to the number of establishments and receipt s for five major kind-of-business groups. SA3 contains number of establishments and receipts for selected services total and for 130 kind-of- business classifications. SA4 presents receipts and rank by volume of receipts. SA5 statistics are given by city size for number of incorporated cities, total population, number of establishments, receipts, yearly payroll, and the percent of total by population and sales.

    Each of the files has slightly different geography for which summaries are presented. SA1 has summaries for the United States, divisions, States, SCA's and SMSA's, and counties and cities with over 300 service establishments. SA2 presents summary counts for each city of 2,500 inhabitants or more and for remainder of county. SA3 has summaries for the United States, regions, divisions, and States. SA4 presents summaries for the 250 largest counties and cities. SA5 presents United States tot al.

    Data pertain to the date of the census, 1972. The first major enumeration of Selected Service establishments covered 1933. Censuses were also taken in 1939, 1948, and in 5 year intervals since

  2. f

    Summary statistics for 5 microsatellite loci including number of individuals...

    • datasetcatalog.nlm.nih.gov
    Updated Jan 3, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bechsgaard, Jesper; Goodacre, Sara; Tuni, Cristina; Bilde, Trine (2012). Summary statistics for 5 microsatellite loci including number of individuals analyzed (N), number of alleles (NA), expected (He) and observed (Ho) heterozygosity, allelic richness, estimates of inbreeding coefficient (Fis), and relatedness (R) among offspring and adult females and percentage of full- sib (FS) and half-sib (HS) relationships between pairs of individuals. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001121817
    Explore at:
    Dataset updated
    Jan 3, 2012
    Authors
    Bechsgaard, Jesper; Goodacre, Sara; Tuni, Cristina; Bilde, Trine
    Description

    *denotes a significant deviation from Hardy-Weinberg equilibrium (P<0.05).

  3. NBA Player Data (1996-2024)

    • kaggle.com
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damir Dizdarevic (2024). NBA Player Data (1996-2024) [Dataset]. https://www.kaggle.com/datasets/damirdizdarevic/nba-dataset-eda-and-ml-compatible
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2024
    Dataset provided by
    Kaggle
    Authors
    Damir Dizdarevic
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    NBA data ranging from 1996 to 2024 contains physical attributes, bio information, (advanced) stats, and positions of players.

    No missing values, certain data preprocessing will be needed depending on the task.

    Data was gathered from the nba.com and Basketball Reference - starting with the season 1996/97 and up until the latest season 2023/24.

    A lot of options for EDA & ML present - analyzing the change of physical attributes by position, how the number of 3-point shots changed throughout years, how the number of foreign players increased; using Machine Learning to predict player's points, rebounds and assists, predicting player's position, player clustering, etc.

    The issue with the data was that the data about player height and weight was in Imperial system, so the scatterplot of heights and weights was not looking good (around only 20 distinct values for height and around 150 for weight, which is quite bad for the dataset of 13.000 players). I created a script in which I assign a random height to the player between 2 heights (let's say between 200.66 cm and 203.2 cm, which would be 6-7 and 6-8 in Imperial system), but I did it in a way that 80% of values fall in the range of 5 to 35% increase, which still keeps the integrity of the data (average height of the whole dataset increased for less than 1 cm). I did the same thing for the weight: since difference between 2 pounds is around 0.44 kg, I would assign a random value for weight for each player that is either +/- 0.22 from his original weight. Here I observed a change in the average weight of the whole dataset of around 0.09 kg, which is insignificant.

    Unfortunately the NBA doesn't provide the data in cm and kg, and although this is not the perfect approach regarding accuracy, it is still much better than assigning only 20 heights to the dataset of 13.000 players.

  4. Summary statistics of five distribution models of county population counts...

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meng Xu; Joel E. Cohen (2023). Summary statistics of five distribution models of county population counts within each state (within at least five counties in each state). size is the number of combinations of censuses and counties. [Dataset]. http://doi.org/10.1371/journal.pone.0245062.s026
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Meng Xu; Joel E. Cohen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Other columns are defined in S21 Table. (XLSX)

  5. ASL 20-Words Dataset v1

    • kaggle.com
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossam Magdy Balaha (2024). ASL 20-Words Dataset v1 [Dataset]. http://doi.org/10.34740/kaggle/dsv/9797396
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hossam Magdy Balaha
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Arabic Sign Language (ASL) 20-Words Dataset v1 was carefully designed to reflect natural conditions, aiming to capture realistic signing environments and circumstances. Recognizing that nearly everyone has access to a smartphone with a camera as of 2020, the dataset was specifically recorded using mobile phones, aligning with how people commonly record videos in daily life. This approach ensures the dataset is grounded in real-world conditions, enhancing its applicability for practical use cases.

    Each video in this dataset was recorded directly on the authors' smartphones, without any form of stabilization—neither hardware nor software. As a result, the videos vary in resolution and were captured across diverse locations, places, and backgrounds. This variability introduces natural noise and conditions, supporting the development of robust deep learning models capable of generalizing across environments.

    In total, the dataset comprises 8,467 videos of 20 sign language words, contributed by 72 volunteers aged between 20 and 24. Each volunteer performed each sign a minimum of five times, resulting in approximately 100 videos per participant. This repetition standardizes the data and ensures each sign is adequately represented across different performers. The dataset’s mean video count per sign is 423.35, with a standard deviation of 18.58, highlighting the balance and consistency achieved across the signs.

    For reference, Table 2 (in the research article) provides the count of videos for each sign, while Figure 2 (in the research article) offers a visual summary of the statistics for each word in the dataset. Additionally, sample frames from each word are displayed in Figure 3 (in the research article), giving a glimpse of the visual content captured.

    For in-depth insights into the methodology and the dataset's creation, see the research paper: Balaha, M.M., El-Kady, S., Balaha, H.M., et al. (2023). "A vision-based deep learning approach for independent-users Arabic sign language interpretation". Multimedia Tools and Applications, 82, 6807–6826. https://doi.org/10.1007/s11042-022-13423-9

    Please consider citing the following if you use this dataset:

    @misc{balaha_asl_2024_db,
     title={ASL 20-Words Dataset v1},
     url={https://www.kaggle.com/dsv/9783691},
     DOI={10.34740/KAGGLE/DSV/9783691},
     publisher={Kaggle},
     author={Mostafa Magdy Balaha and Sara El-Kady and Hossam Magdy Balaha and Mohamed Salama and Eslam Emad and Muhammed Hassan and Mahmoud M. Saafan},
     year={2024}
    }
    
    @article{balaha2023vision,
     title={A vision-based deep learning approach for independent-users Arabic sign language interpretation},
     author={Balaha, Mostafa Magdy and El-Kady, Sara and Balaha, Hossam Magdy and Salama, Mohamed and Emad, Eslam and Hassan, Muhammed and Saafan, Mahmoud M},
     journal={Multimedia Tools and Applications},
     volume={82},
     number={5},
     pages={6807--6826},
     year={2023},
     publisher={Springer}
    }
    

    This dataset is available under the CC BY-NC-SA 4.0 license, which allows for sharing and adaptation under conditions of non-commercial use, proper attribution, and distribution under the same license.

    For further inquiries or information: https://hossambalaha.github.io/.

  6. Geophone Sensor Dataset

    • kaggle.com
    zip
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Sezgin (2024). Geophone Sensor Dataset [Dataset]. https://www.kaggle.com/datasets/sezginfurkan/geophone-sensor-dataset
    Explore at:
    zip(75617 bytes)Available download formats
    Dataset updated
    Dec 26, 2024
    Authors
    Furkan Sezgin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains vibration data collected from a geoscope sensor to analyze human activities (walking, running, and waiting). The data is segmented into 3-second time windows, with each window containing 120 rows of data per person. The dataset consists of 1800 rows of data from five individuals: Furkan, Enes, Yusuf, Alihan and Emir.

    Each person’s activity is classified into one of the three categories: walking, running, or standing still. The data includes both statistical and frequency-domain features extracted from the raw vibration signals, detailed below:

    Statistical Features: - Mean: The average value of the signal over the time window.- - Median: The middle value of the signal, dividing the data into two equal halves. - Standard Deviation: A measure of how much the signal deviates from its mean, indicating the signal's variability. - Minimum: The smallest value in the signal during the time window. - Maximum: The largest value in the signal during the time window. - First Quartile (Q1): The median of the lower half of the data, representing the 25th percentile. - Third Quartile (Q3): The median of the upper half of the data, representing the 75th percentile. - Skewness: A measure of the asymmetry of the signal distribution, showing whether the data is skewed to the left or right.

    Frequency-Domain Features: - Dominant Frequency: The frequency with the highest power, providing insights into the primary periodicity of the signal. - Signal Energy: The total energy of the signal, representing the sum of the squared signal values over the time window.

    Dataset Overview: - Total Rows: 1800 - Number of Individuals: 5 (Furkan, Enes, Yusuf, Alihan, Emir) - Activity Types: Walking, Running, Waiting (Standing Still) - Time Frame: 3-second time windows (120 rows per individual for each activity) - Features: Statistical and frequency-domain features (as described above)

    This dataset is suitable for training models on activity recognition, user identification, and other related tasks. It provides rich, detailed features that can be used for various classification and analysis applications.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
U.S. Bureau of the Census; United States (2020). Census of selected service industries, 1972 summary statistic file SA [Dataset]. https://datasearch.gesis.org/dataset/httpsdataverse.unc.eduoai--hdl1902.29C-7

Census of selected service industries, 1972 summary statistic file SA

Explore at:
Dataset updated
Jan 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
U.S. Bureau of the Census; United States
Description

The subject matter in the five individual files which comprise the total data package is similar. SA1 presents detailed kind-of- business statistics (two-, three-, and four-digit industry levels) on number of establishments and receipts (total and with payroll), number of proprietorships and partnerships, annual and first quarter payroll, and number of paid employees. SA2 contains the same data items as above for selected services total, in addition to the number of establishments and receipt s for five major kind-of-business groups. SA3 contains number of establishments and receipts for selected services total and for 130 kind-of- business classifications. SA4 presents receipts and rank by volume of receipts. SA5 statistics are given by city size for number of incorporated cities, total population, number of establishments, receipts, yearly payroll, and the percent of total by population and sales.

Each of the files has slightly different geography for which summaries are presented. SA1 has summaries for the United States, divisions, States, SCA's and SMSA's, and counties and cities with over 300 service establishments. SA2 presents summary counts for each city of 2,500 inhabitants or more and for remainder of county. SA3 has summaries for the United States, regions, divisions, and States. SA4 presents summaries for the 250 largest counties and cities. SA5 presents United States tot al.

Data pertain to the date of the census, 1972. The first major enumeration of Selected Service establishments covered 1933. Censuses were also taken in 1939, 1948, and in 5 year intervals since

Search
Clear search
Close search
Google apps
Main menu