100+ datasets found
  1. f

    Variable definition, descriptive statistics and data source, 1995–2007.

    • figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfred M. Wu (2023). Variable definition, descriptive statistics and data source, 1995–2007. [Dataset]. http://doi.org/10.1371/journal.pone.0225299.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Alfred M. Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Variable definition, descriptive statistics and data source, 1995–2007.

  2. f

    Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...

    • frontiersin.figshare.com
    zip
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.ZIP [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  3. Government Transportation Financial Statistics (GTFS) Data

    • catalog.data.gov
    • data.bts.gov
    • +1more
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Transportation Statistics (2024). Government Transportation Financial Statistics (GTFS) Data [Dataset]. https://catalog.data.gov/dataset/government-transportation-financial-statistics-gtfs-data
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    Bureau of Transportation Statisticshttp://www.rita.dot.gov/bts
    Description

    Government Transportation Financial Statistics is no longer being updated by the Bureau of Transportation Statistics as of June 2024! It is being replaced by our new product, Transportation Public Financial Statistics (TPFS) which provides more granularity by expanding the categories of revenues and expenditures. The new dataset can be found: https://data.bts.gov/Research-and-Statistics/Transportation-Public-Financial-Statistics-TPFS-/6aiz-ybqx/about_data Further information about the TPFS can be found at: https://www.bts.gov/tpfs The government plays an important role in the U.S. transportation system, as a provider of transportation infrastructure and as an administrator and regulator of the system. The government spends a large amount of funds on building, rehabilitating, maintaining, operating, and administering the infrastructure system. Government revenue generated from several sources including user fees, taxes from transportation and non-transportation-related activities, borrowing, and grants from federal, state, and local governments primarily supports these activities. Government Transportation Financial Statistics (GTFS) provides a set of maps, charts, and tables with information on transportation-related revenue and expenditures for all levels of government, including federal, state, and local, and for all modes of transportation. Related tables can be found in National Transportation Statistics, Section 3.D - Government Finance (https://www.bts.gov/topics/national-transportation-statistics). For further information, data definitions, and methodology, see https://www.bts.gov/gtfs

  4. d

    Streamflow statistics calculated from daily mean streamflow data collected...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Streamflow statistics calculated from daily mean streamflow data collected during water years 1901–2015 for selected U.S. Geological Survey streamgages [Dataset]. https://catalog.data.gov/dataset/streamflow-statistics-calculated-from-daily-mean-streamflow-data-collected-during-water-ye
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    In 2016, non-interpretive streamflow statistics were compiled for streamgages located throughout the Nation and stored in the StreamStatsDB database for use with StreamStats and other applications. Two previously published USGS computer programs that were designed to help calculate streamflow statistics were updated to better support StreamStats as part of this effort. These programs are named “GNWISQ” (Get National Water Information System Streamflow (Q) files) and “QSTATS” (Streamflow (Q) Statistics). Statistics for 20,438 streamgages that had 1 or more complete years of record during water years 1901 through 2015 were calculated from daily mean streamflow data; 19,415 of these streamgages were within the conterminous United States. About 89 percent of the 20,438 streamgages had 3 or more years of record, and 65 percent had 10 or more years of record. Drainage areas of the 20,438 streamgages ranged from 0.01 to 1,144,500 square miles. The magnitude of annual average streamflow yields (streamflow per square mile) for these streamgages varied by almost six orders of magnitude, from 0.000029 to 34 cubic feet per second per square mile. About 64 percent of these streamgages did not have any zero-flow days during their available period of record. The 18,122 streamgages with 3 or more years of record were included in the StreamStatsDB compilation so they would be available via the StreamStats interface for user-selected streamgages.

  5. C

    Statistical Data Catalog Cologne

    • ckan.mobidatalab.eu
    Updated Jul 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Köln (2023). Statistical Data Catalog Cologne [Dataset]. https://ckan.mobidatalab.eu/dataset/statisticaldatacatalogue-coln
    Explore at:
    http://publications.europa.eu/resource/authority/file-type/csv(3748), http://publications.europa.eu/resource/authority/file-type/json, http://publications.europa.eu/resource/authority/file-type/csv(272571), http://publications.europa.eu/resource/authority/file-type/csv(307022), http://publications.europa.eu/resource/authority/file-type/csv(272780), http://publications.europa.eu/resource/authority/file-type/csv(3758), http://publications.europa.eu/resource/authority/file-type/csv(273516), http://publications.europa.eu/resource/authority/file-type/csv(273403), http://publications.europa.eu/resource/authority/file-type/csv(3764), http://publications.europa.eu/resource/authority/file-type/csv(19787), http://publications.europa.eu/resource/authority/file-type/csv(3730), http://publications.europa.eu/resource/authority/file-type/csv(275264), http://publications.europa.eu/resource/authority/file-type/csv(5356), http://publications.europa.eu/resource/authority/file-type/csv(3753), http://publications.europa.eu/resource/authority/file-type/csv(3752), http://publications.europa.eu/resource/authority/file-type/csv(273515), http://publications.europa.eu/resource/authority/file-type/csv(3735), http://publications.europa.eu/resource/authority/file-type/csv(1215), http://publications.europa.eu/resource/authority/file-type/csv(271286), http://publications.europa.eu/resource/authority/file-type/csv(274184), http://publications.europa.eu/resource/authority/file-type/csv(3746), http://publications.europa.eu/resource/authority/file-type/csv(273265), http://publications.europa.eu/resource/authority/file-type/csv(3754)Available download formats
    Dataset updated
    Jul 26, 2023
    Dataset provided by
    Köln
    License

    Data licence Germany – Attribution – Version 2.0https://www.govdata.de/dl-de/by-2-0
    License information was derived automatically

    Description

    Data from various sources are updated in the Statistical Information System of the City of Cologne. The annual statistical yearbook publishes these in tabular, graphic and cartographic form at the level of the city districts and districts. Furthermore, definitions and calculation bases are explained. Small-scale statistics at the level of the 86 districts can be obtained from the Cologne district information become. All levels of the local area structure are presented in this publication explained.

    This statistical data catalogue supplements the range of small-scale data. Selected structural data can be called up here in compact tabular form at the level of the 570 statistical districts or the 86 districts. The two overviews provide information about which data is available and from which source it originates. The data itself is provided annually.

    Notes:

    • Data sources are indicated in the summary tables. When using the data, the data license Germany - attribution - version 2.0 must be observed.
    • Some values ​​cannot be given to protect statistical confidentiality. For the data sets of the Federal Employment Agency, these are values ​​from 1 to < 10, for all further data records values ​​from 1 to < 5. This is marked in the data by a * .
    • The differentiation of population figures by gender is currently made according to female and male residents. The case numbers of those who define themselves as non-binary/diverse are so low at a small-scale level that they cannot be reported for reasons of statistical confidentiality.
    • The determination of residents with a migration background is carried out by combination various characteristics from the resident registration procedure. The data are to be interpreted as estimates. The statistical yearbook of the city of Cologne provides further details.
    • The information on households comes from the household generation process. This is a statistical procedure in which residents within an address are assigned to a household as far as possible by querying certain criteria. If the procedure does not identify any connections, the allocation to single-person households takes place. The statistical yearbook of the city of Cologne provides further details.
    • The data set pupils* at general schools (spatial location by place of residence) is available from 2013.
    • The number of the statistical quarter or district is a spatial location and can be linked to the geodata (see related resource below).

  6. Data from: Standard country or area codes for statistical use

    • kaggle.com
    zip
    Updated Mar 8, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Hore-Lacy (2018). Standard country or area codes for statistical use [Dataset]. https://kaggle.com/willhl/unsd_world.csv
    Explore at:
    zip(4381 bytes)Available download formats
    Dataset updated
    Mar 8, 2018
    Authors
    Will Hore-Lacy
    Description

    Dataset

    This dataset was created by Will Hore-Lacy

    Released under Other (specified in description)

    Contents

    It contains the following files:

  7. H

    Data from: Cross-National Time-Series Data Archive

    • dataverse.harvard.edu
    Updated Sep 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Databanks International (2024). Cross-National Time-Series Data Archive [Dataset]. http://doi.org/10.7910/DVN/K2SSTK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Databanks International
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.7910/DVN/K2SSTKhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.1/customlicense?persistentId=doi:10.7910/DVN/K2SSTK

    Time period covered
    1815 - 2024
    Area covered
    United States
    Description

    The Cross-National Time-Series Data Archive provides more than 200 years of annual data for nations and empires of the world including those that no longer exist. It covers demographic, social, political, and economic topics. Select data goes back to 1815. Not all indicators are available for all countries or in all years. Fore data definitions, list of variables and countries covered, consult the accompanying codebook and user manuals. More information on topics, list of variables and countries covered is also available on CNTS website. DATA AVAILABLE FOR YEARS: 1815-2024

  8. d

    Department of Labor, Office of Research (Current Employment Statistics NSA...

    • catalog.data.gov
    • data.ct.gov
    • +3more
    Updated Aug 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2024). Department of Labor, Office of Research (Current Employment Statistics NSA 1990 - Current) [Dataset]. https://catalog.data.gov/dataset/department-of-labor-office-of-research-current-employment-statistics-nsa-1990-current
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    data.ct.gov
    Description

    Historical Employment Statistics 1990 - current. The Current Employment Statistics (CES) more information program provides the most current estimates of nonfarm employment, hours, and earnings data by industry (place of work) for the nation as a whole, all states, and most major metropolitan areas. The CES survey is a federal-state cooperative endeavor in which states develop state and sub-state data using concepts, definitions, and technical procedures prescribed by the Bureau of Labor Statistics (BLS). Estimates produced by the CES program include both full- and part-time jobs. Excluded are self-employment, as well as agricultural and domestic positions. In Connecticut, more than 4,000 employers are surveyed each month to determine the number of the jobs in the State. For more information please visit us at http://www1.ctdol.state.ct.us/lmi/ces/default.asp.

  9. f

    Dataset for: A generalized partially linear mean-covariance regression model...

    • wiley.figshare.com
    text/x-tex
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xueying Zheng; Guoyou Qin; Dongsheng Tu (2023). Dataset for: A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials [Dataset]. http://doi.org/10.6084/m9.figshare.4880756.v1
    Explore at:
    text/x-texAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Xueying Zheng; Guoyou Qin; Dongsheng Tu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial which motivated this study. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated responses.

  10. US Household Income Statistics

    • kaggle.com
    Updated Apr 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2018). US Household Income Statistics [Dataset]. https://www.kaggle.com/goldenoakresearch/us-household-income-stats-geo-locations/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Golden Oak Research Group
    Area covered
    United States
    Description

    New Upload:

    Added +32,000 more locations. For information on data calculations please refer to the methodology pdf document. Information on how to calculate the data your self is also provided as well as how to buy data for $1.29 dollars.

    What you get:

    The database contains 32,000 records on US Household Income Statistics & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 348,893 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to up vote. Up vote right now, please. Enjoy!

    Household & Geographic Statistics:

    • Mean Household Income (double)
    • Median Household Income (double)
    • Standard Deviation of Household Income (double)
    • Number of Households (double)
    • Square area of land at location (double)
    • Square area of water at location (double)

    Geographic Location:

    • Longitude (double)
    • Latitude (double)
    • State Name (character)
    • State abbreviated (character)
    • State_Code (character)
    • County Name (character)
    • City Name (character)
    • Name of city, town, village or CPD (character)
    • Primary, Defines if the location is a track and block group.
    • Zip Code (character)
    • Area Code (character)

    Abstract

    The dataset originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

    License

    Only proper citing is required please see the documentation for details. Have Fun!!!

    Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

    Sources, don't have 2 dollars? Get the full information yourself!

    2011-2015 ACS 5-Year Documentation was provided by the U.S. Census Reports. Retrieved August 2, 2017, from https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_by_state/

    Found Errors?

    Please tell us so we may provide you the most accurate data possible. You may reach us at: research_development@goldenoakresearch.com

    for any questions you can reach me on at 585-626-2965

    please note: it is my personal number and email is preferred

    Check our data's accuracy: Census Fact Checker

    Access all 348,893 location records and more:

    Don't settle. Go big and win big. Optimize your potential. Overcome limitation and outperform expectation. Access all household income records on a scale roughly equivalent to a neighborhood, see link below:

    Website: Golden Oak Research Kaggle Deals all databases $1.29 Limited time only

    A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.

  11. u

    Results and analysis using the Lean Six-Sigma define, measure, analyze,...

    • researchdata.up.ac.za
    docx
    Updated Mar 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Modiehi Mophethe (2024). Results and analysis using the Lean Six-Sigma define, measure, analyze, improve, and control (DMAIC) Framework [Dataset]. http://doi.org/10.25403/UPresearchdata.25370374.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 12, 2024
    Dataset provided by
    University of Pretoria
    Authors
    Modiehi Mophethe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.

  12. f

    Dataset for: Optimal Transport, Mean Partition, and Uncertainty Assessment...

    • wiley.figshare.com
    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia Li; Beomseok Seo; Lin Lin (2023). Dataset for: Optimal Transport, Mean Partition, and Uncertainty Assessment in Cluster Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.8038925
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Wiley
    Authors
    Jia Li; Beomseok Seo; Lin Lin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    In scientific data analysis, clusters identified computationally often substantiate existing hypotheses or motivate new ones. Yet the combinatorial nature of the clustering result, which is a partition rather than a set of parameters or a function, blurs notions of mean and variance. This intrinsic difficulty hinders the development of methods to improve clustering by aggregation or to assess the uncertainty of clusters generated. We overcome that barrier by aligning clusters via optimal transport. Equipped with this technique, we propose a new algorithm to enhance clustering by any baseline method using bootstrap samples. Cluster alignment enables us to quantify variation in the clustering result at the levels of both overall partitions and individual clusters. Set relationships between clusters such as one-to-one match, split, and merge can be revealed. A covering point set for each cluster, a concept kin to the confidence interval, is proposed. The tools we have developed here will help address the crucial question of whether any cluster is an intrinsic or spurious pattern. Experimental results on both simulated and real datasets are provided.

  13. Australian Vocational Education and Training (VET) statistics - Dataset -...

    • data.sa.gov.au
    Updated Jul 10, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.sa.gov.au (2014). Australian Vocational Education and Training (VET) statistics - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/aus-vocational-education-and-training-vet-statistics
    Explore at:
    Dataset updated
    Jul 10, 2014
    Dataset provided by
    Government of South Australiahttp://sa.gov.au/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia, South Australia
    Description

    Students and Courses and Apprentices and Trainees: These statistics cover administrative data sets on student enrolments and qualifications attained with approximately 2 million students enrolling on vocation education and training in Australia each year, 400,000 graduates each year, and around 400,000 people in training as part of an apprenticeship or traineeships. Demographic information on students as well as the qualification they are training in and where the training took place are included. Courses are classified by intended occupation on completion, and field of study. Student Outcomes Survey: In addition a graduate destination survey is run capturing information on the quality of training, occupations before and after training, salary, and further education. Under data tab each collection appears and can be selected individually for information excel files and publications, under data data are three resources, Vocstats datacubes, VET Students by Industry, VET Graduates outcomes, salaries and jobs. http://www.ncver.edu.au For an overview of the statistics please see the following publication https://www.ncver.edu.au/publications/publications/all-publications/statistical-standard-software/avetmiss-data-element-definitions-edition-2.2# Datasets to be attributed to National Centre for Vocational Education Research (NCVER). https://www.ncver.edu.au/ Register for VOCSTATS by visiting the website (http://www.ncver.edu.au/wps/portal/vetdataportal/data/menu/vocstats)

  14. d

    EINO01 - Mean and Median Weekly and Annual Earnings

    • datasalsa.com
    • data.europa.eu
    csv, json-stat, px +1
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2025). EINO01 - Mean and Median Weekly and Annual Earnings [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=eino01-mean-and-median-weekly-and-annual-earnings
    Explore at:
    csv, json-stat, xlsx, pxAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset authored and provided by
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 30, 2025
    Description

    EINO01 - Mean and Median Weekly and Annual Earnings. Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Mean and Median Weekly and Annual Earnings...

  15. d

    EINO07 - Mean and Median Weekly Earnings

    • datasalsa.com
    • data.europa.eu
    csv, json-stat, px +1
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (2025). EINO07 - Mean and Median Weekly Earnings [Dataset]. https://datasalsa.com/dataset/?catalogue=data.gov.ie&name=eino07-mean-and-median-weekly-earnings
    Explore at:
    json-stat, px, csv, xlsxAvailable download formats
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Central Statistics Office
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 10, 2025
    Description

    EINO07 - Mean and Median Weekly Earnings . Published by Central Statistics Office. Available under the license Creative Commons Attribution 4.0 (CC-BY-4.0).Mean and Median Weekly Earnings ...

  16. March Madness Augmented Statistics

    • kaggle.com
    Updated Apr 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Siles (2021). March Madness Augmented Statistics [Dataset]. https://www.kaggle.com/colinsiles/march-madness-augmented-statistics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2021
    Dataset provided by
    Kaggle
    Authors
    Colin Siles
    Description

    Context

    A team's mean seasons statistics can be used as predictors for their performance in future games. However, these statistics gain additional meaning when placed in the context of their opponents' (and opponents' opponents') performance. This dataset provides this context for each team. Furthermore, predicting games based on post-season stats causes data leakage, which from experience can be significant in this context (15-20% loss in accuracy). Thus, this dataset provides each of these statistics prior to each game of the regular season, preventing any source of data leakage.

    Content

    All data is derived from the March Madness competition data. Each original column was renamed to "A" and "B" instead of "W" and "L," and the mirrored to represent both orderings of opponents. Each team's mean stats are computed (both their stats, and the mean "allowed" or "forced" statistics by their opponents). To compute the mean opponents' stats, we analyze the games played by each opponent (excluding games played against the team in question), and compute the mean statistics for those games. We then compute the mean of these mean statistics, weighted by the number of times the team in question played each opponent. The opponents' opponent's stats are computed as a weighted average of the opponents' average. This results in statistics similar to those used to compute strength of schedule or RPI, just that they go beyond win percentages (See: https://en.wikipedia.org/wiki/Rating_percentage_index)

    The per game statistics are computed by pretending we don't have any of the data on or after the day in question.

    Next Steps

    Currently, the data isn't computed particularly efficiently. Computing the per game averages for every day of the season is necessary to compute fully accurate opponents' opponents' average, but takes about 90 minutes to obtain. It is probably possible to parallelize this, and the per-game averages involve a lot of repeated computation (basically computing the final averages over and over again for each day). Speeding this up will make it more convenient to make changes to the dataset.

    I would like to transform these statistics to be per-possession, add shooting percentages, pace, and number of games played (to give an idea of the amount uncertainty that exists in the per-game averages). Some of these can be approximated with the given data (but the results won't be exact), while others will need to be computed from scratch.

  17. f

    Evaluating Statistical Methods Using Plasmode Data Sets in the Age of...

    • plos.figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gary L. Gadbury; Qinfang Xiang; Lin Yang; Stephen Barnes; Grier P. Page; David B. Allison (2023). Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates [Dataset]. http://doi.org/10.1371/journal.pgen.1000098
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS Genetics
    Authors
    Gary L. Gadbury; Qinfang Xiang; Lin Yang; Stephen Barnes; Grier P. Page; David B. Allison
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.

  18. Starfleet Headache Treatment - Example Data for Repeated ANOVA

    • figshare.com
    txt
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jesus Rogel-Salazar (2022). Starfleet Headache Treatment - Example Data for Repeated ANOVA [Dataset]. http://doi.org/10.6084/m9.figshare.19089896.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 28, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jesus Rogel-Salazar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fictitious data to look at the overall health of Starfleet volunteers on four different drugs. Since each volunteer is measured on each of the four drugs, we propose to use this datasets to look at repeater measures ANOVA to determine if the mean health scores differs between drugs.

  19. World Gender Statistics

    • kaggle.com
    zip
    Updated Nov 28, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2016). World Gender Statistics [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-gender-statistics/versions/1
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Nov 28, 2016
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    Area covered
    World
    Description

    The Gender Statistics database is a comprehensive source for the latest sex-disaggregated data and gender statistics covering demography, education, health, access to economic opportunities, public life and decision-making, and agency.

    The Data

    The data is split into several files, with the main one being Data.csv. The Data.csv contains all the variables of interest in this dataset, while the others are lists of references and general nation-by-nation information.

    Data.csv contains the following fields:

    Data.csv

    • Country.Name: the name of the country
    • Country.Code: the country's code
    • Indicator.Name: the name of the variable that this row represents
    • Indicator.Code: a unique id for the variable
    • 1960 - 2016: one column EACH for the value of the variable in each year it was available

    The other files

    I couldn't find any metadata for these, and I'm not qualified to guess at what each of the variables mean. I'll list the variables for each file, and if anyone has any suggestions (or, even better, actual knowledge/citations) as to what they mean, please leave a note in the comments and I'll add your info to the data description.

    Country-Series.csv

    • CountryCode
    • SeriesCode
    • DESCRIPTION

    Country.csv

    • Country.Code
    • Short.Name
    • Table.Name
    • Long.Name
    • 2-alpha.code
    • Currency.Unit
    • Special.Notes
    • Region
    • Income.Group
    • WB-2.code
    • National.accounts.base.year
    • National.accounts.reference.year
    • SNA.price.valuation
    • Lending.category
    • Other.groups
    • System.of.National.Accounts
    • Alternative.conversion.factor
    • PPP.survey.year
    • Balance.of.Payments.Manual.in.use
    • External.debt.Reporting.status
    • System.of.trade
    • Government.Accounting.concept
    • IMF.data.dissemination.standard
    • Latest.population.census
    • Latest.household.survey
    • Source.of.most.recent.Income.and.expenditure.data
    • Vital.registration.complete
    • Latest.agricultural.census
    • Latest.industrial.data
    • Latest.trade.data
    • Latest.water.withdrawal.data

    FootNote.csv

    • CountryCode
    • SeriesCode
    • Year
    • DESCRIPTION

    Series-Time.csv

    • SeriesCode
    • Year
    • DESCRIPTION

    Series.csv

    • Series.Code
    • Topic
    • Indicator.Name
    • Short.definition
    • Long.definition
    • Unit.of.measure
    • Periodicity
    • Base.Period
    • Other.notes
    • Aggregation.method
    • Limitations.and.exceptions
    • Notes.from.original.source
    • General.comments
    • Source
    • Statistical.concept.and.methodology
    • Development.relevance
    • Related.source.links
    • Other.web.links
    • Related.indicators
    • License.Type

    Acknowledgements

    This dataset was downloaded from The World Bank's Open Data project. The summary of the Terms of Use of this data is as follows:

    • You are free to copy, distribute, adapt, display or include the data in other products for commercial and noncommercial purposes at no cost subject to certain limitations summarized below.

    • You must include attribution for the data you use in the manner indicated in the metadata included with the data.

    • You must not claim or imply that The World Bank endorses your use of the data by or use The World Bank’s logo(s) or trademark(s) in conjunction with such use.

    • Other parties may have ownership interests in some of the materials contained on The World Bank Web site. For example, we maintain a list of some specific data within the Datasets that you may not redistribute or reuse without first contacting the original content provider, as well as information regarding how to contact the original content provider. Before incorporating any data in other products, please check the list: Terms of use: Restricted Data.

    -- [ed. note: this last is not applicable to the Gender Statistics database]

    • The World Bank makes no warranties with respect to the data and you agree The World Bank shall not be liable to you in connection with your use of the data.

    • This is only a summary of the Terms of Use for Datasets Listed in The World Bank Data Catalogue. Please read the actual agreement that controls your use of the Datasets, which is available here: Terms of use for datasets. Also see World Bank Terms and Conditions.

  20. w

    Vehicle licensing statistics data tables

    • gov.uk
    • s3.amazonaws.com
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2025). Vehicle licensing statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-tables
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    GOV.UK
    Authors
    Department for Transport
    Description

    Data files containing detailed information about vehicles in the UK are also available, including make and model data.

    Some tables have been withdrawn and replaced. The table index for this statistical series has been updated to provide a full map between the old and new numbering systems used in this page.

    Tables VEH0101 and VEH1104 have not yet been revised to include the recent changes to Large Goods Vehicles (LGV) and Heavy Goods Vehicles (HGV) definitions for data earlier than 2023 quarter 4. This will be amended as soon as possible.

    All vehicles

    Licensed vehicles

    Overview

    VEH0101: https://assets.publishing.service.gov.uk/media/6846e8dc57f3515d9611f119/veh0101.ods">Vehicles at the end of the quarter by licence status and body type: Great Britain and United Kingdom (ODS, 151 KB)

    Detailed breakdowns

    VEH0103: https://assets.publishing.service.gov.uk/media/6846e8dcd25e6f6afd4c01d5/veh0103.ods">Licensed vehicles at the end of the year by tax class: Great Britain and United Kingdom (ODS, 33 KB)

    VEH0105: https://assets.publishing.service.gov.uk/media/6846e8dd57f3515d9611f11a/veh0105.ods">Licensed vehicles at the end of the quarter by body type, fuel type, keepership (private and company) and upper and lower tier local authority: Great Britain and United Kingdom (ODS, 16.3 MB)

    VEH0206: https://assets.publishing.service.gov.uk/media/6846e8dee5a089417c806179/veh0206.ods">Licensed cars at the end of the year by VED band and carbon dioxide (CO2) emissions: Great Britain and United Kingdom (ODS, 42.3 KB)

    VEH0601: https://assets.publishing.service.gov.uk/media/6846e8df5e92539572806176/veh0601.ods">Licensed buses and coaches at the end of the year by body type detail: Great Britain and United Kingdom (ODS, 24.6 KB)

    VEH1102: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617b/veh1102.ods">Licensed vehicles at the end of the year by body type and keepership (private and company): Great Britain and United Kingdom (ODS, 146 KB)

    VEH1103: https://assets.publishing.service.gov.uk/media/6846e8e0e5a089417c80617c/veh1103.ods">Licensed vehicles at the end of the quarter by body type and fuel type: Great Britain and United Kingdom (ODS, 992 KB)

    VEH1104: https://assets.publishing.service.gov.uk/media/6846e8e15e92539572806177/veh1104.ods">Licensed vehicles at the end of the

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alfred M. Wu (2023). Variable definition, descriptive statistics and data source, 1995–2007. [Dataset]. http://doi.org/10.1371/journal.pone.0225299.t001

Variable definition, descriptive statistics and data source, 1995–2007.

Related Article
Explore at:
xlsAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Alfred M. Wu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Variable definition, descriptive statistics and data source, 1995–2007.

Search
Clear search
Close search
Google apps
Main menu