100+ datasets found
  1. f

    Dataset for: Some Remarks on the R2 for Clustering

    • wiley.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicola Loperfido; Thaddeus Tarpey (2023). Dataset for: Some Remarks on the R2 for Clustering [Dataset]. http://doi.org/10.6084/m9.figshare.6124508.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Wiley
    Authors
    Nicola Loperfido; Thaddeus Tarpey
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.

  2. U

    Streamflow statistics calculated from daily mean streamflow data collected...

    • data.usgs.gov
    • datasets.ai
    • +1more
    Updated Apr 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory Granato (2023). Streamflow statistics calculated from daily mean streamflow data collected during water years 1901–2015 for selected U.S. Geological Survey streamgages [Dataset]. http://doi.org/10.5066/F71V5CFT
    Explore at:
    Dataset updated
    Apr 25, 2023
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Gregory Granato
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Oct 1, 1900 - Sep 30, 2015
    Description

    In 2016, non-interpretive streamflow statistics were compiled for streamgages located throughout the Nation and stored in the StreamStatsDB database for use with StreamStats and other applications. Two previously published USGS computer programs that were designed to help calculate streamflow statistics were updated to better support StreamStats as part of this effort. These programs are named “GNWISQ” (Get National Water Information System Streamflow (Q) files) and “QSTATS” (Streamflow (Q) Statistics). Statistics for 20,438 streamgages that had 1 or more complete years of record during water years 1901 through 2015 were calculated from daily mean streamflow data; 19,415 of these streamgages were within the conterminous United States. About 89 percent of the 20,438 streamgages had 3 or more years of record, and 65 percent had 10 or more years of record. Drainage areas of the 20,438 streamgages ranged from 0.01 to 1,144,500 square miles. The magnitude of annual average streamflow y ...

  3. Ad-hoc statistical analysis: 2019/20 Quarter 1

    • gov.uk
    • s3.amazonaws.com
    Updated Aug 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Digital, Culture, Media & Sport (2022). Ad-hoc statistical analysis: 2019/20 Quarter 1 [Dataset]. https://www.gov.uk/government/statistical-data-sets/ad-hoc-statistical-analysis-201920-quarter-1
    Explore at:
    Dataset updated
    Aug 23, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Digital, Culture, Media & Sport
    Description

    This page lists ad-hoc statistics released during the period April - June 2019. These are additional analyses not included in any of the Department for Digital, Culture, Media and Sport’s standard publications.

    If you would like any further information please contact evidence@culture.gov.uk.

    April 2019 - Engagement with cultural activities and mean wellbeing scores of adults (16+), 2017/18, England, Taking Part survey

    https://assets.publishing.service.gov.uk/media/5ff6f401e90e0763a6055356/Taking_Part_Survey_October_2017_to_September_2018_Provisional_tables_V2.xlsx">Engagement with cultural activities and mean wellbeing scores of adults (16+), 2017/18, England, Taking Part survey

    MS Excel Spreadsheet, 239 KB

    April 2019 - DCMS Sector Economic Estimates: Employment of UK residents in DCMS sectors where the workplace is outside the UK, 2017

    https://assets.publishing.service.gov.uk/media/5ff6f4018fa8f53b7881f3df/Overseas_employment_V2.xlsx">DCMS Sector Economic Estimates: Employment of UK residents in DCMS sectors where the workplace is outside the UK, 2017

    MS Excel Spreadsheet, 36.9 KB

  4. d

    Data for generating statistical maps of soil lithium concentrations in the...

    • datasets.ai
    • data.usgs.gov
    • +1more
    55
    Updated Aug 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of the Interior (2024). Data for generating statistical maps of soil lithium concentrations in the conterminous United States [Dataset]. https://datasets.ai/datasets/data-for-generating-statistical-maps-of-soil-lithium-concentrations-in-the-conterminous-un
    Explore at:
    55Available download formats
    Dataset updated
    Aug 27, 2024
    Dataset authored and provided by
    Department of the Interior
    Area covered
    Contiguous United States, United States
    Description

    The product data are six statistics that were estimated for the chemical concentration of lithium in the soil C horizon of the conterminous United States. The estimates are made at 9998 locations that are uniformly distributed across the conterminous United States. The six statistics are the mean for the isometric log-ratio transform of the concentrations, the equivalent mean for the concentrations, the standard deviation for the isometric log-ratio transform of the concentrations, the probability of exceeding a concentration of 55 milligrams per kilogram, the 0.95 quantile for the isometric log-ratio transform of the concentrations, and the equivalent 0.95 quantile for the concentrations. Each statistic may be used to generate a statistical map that shows an attribute of the distribution of lithium concentration.

  5. d

    GLO climate data stats summary

    • data.gov.au
    • researchdata.edu.au
    • +1more
    zip
    Updated Apr 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2022). GLO climate data stats summary [Dataset]. https://data.gov.au/data/dataset/afed85e0-7819-493d-a847-ec00a318e657
    Explore at:
    zip(8810)Available download formats
    Dataset updated
    Apr 13, 2022
    Dataset authored and provided by
    Bioregional Assessment Program
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    Various climate variables summary for all 15 subregions based on Bureau of Meteorology Australian Water Availability Project (BAWAP) climate grids. Including

    1. Time series mean annual BAWAP rainfall from 1900 - 2012.

    2. Long term average BAWAP rainfall and Penman Potentail Evapotranspiration (PET) from Jan 1981 - Dec 2012 for each month

    3. Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P (precipitation); (ii) Penman ETp; (iii) Tavg (average temperature); (iv) Tmax (maximum temperature); (v) Tmin (minimum temperature); (vi) VPD (Vapour Pressure Deficit); (vii) Rn (net radiation); and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend.

    4. Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009).

    As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    There are 4 csv files here:

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset History

    Dataset was created from various BAWAP source data, including Monthly BAWAP rainfall, Tmax, Tmin, VPD, etc, and other source data including monthly Penman PET, Correlation coefficient data. Data were extracted from national datasets for the GLO subregion.

    BAWAP_P_annual_BA_SYB_GLO.csv

    Desc: Time series mean annual BAWAP rainfall from 1900 - 2012.

    Source data: annual BILO rainfall

    P_PET_monthly_BA_SYB_GLO.csv

    long term average BAWAP rainfall and Penman PET from 198101 - 201212 for each month

    Climatology_Trend_BA_SYB_GLO.csv

    Values calculated over the years 1981 - 2012 (inclusive), for 17 time periods (i.e., annual, 4 seasons and 12 months) for the following 8 meteorological variables: (i) BAWAP_P; (ii) Penman ETp; (iii) Tavg; (iv) Tmax; (v) Tmin; (vi) VPD; (vii) Rn; and (viii) Wind speed. For each of the 17 time periods for each of the 8 meteorological variables have calculated the: (a) average; (b) maximum; (c) minimum; (d) average plus standard deviation (stddev); (e) average minus stddev; (f) stddev; and (g) trend

    Risbey_Remote_Rainfall_Drivers_Corr_Coeffs_BA_NSB_GLO.csv

    Correlation coefficients (-1 to 1) between rainfall and 4 remote rainfall drivers between 1957-2006 for the four seasons. The data and methodology are described in Risbey et al. (2009). As described in the Risbey et al. (2009) paper, the rainfall was from 0.05 degree gridded data described in Jeffrey et al. (2001 - known as the SILO datasets); sea surface temperature was from the Hadley Centre Sea Ice and Sea Surface Temperature dataset (HadISST) on a 1 degree grid. BLK=Blocking; DMI=Dipole Mode Index; SAM=Southern Annular Mode; SOI=Southern Oscillation Index; DJF=December, January, February; MAM=March, April, May; JJA=June, July, August; SON=September, October, November. The analysis is a summary of Fig. 15 of Risbey et al. (2009).

    Dataset Citation

    Bioregional Assessment Programme (2014) GLO climate data stats summary. Bioregional Assessment Derived Dataset. Viewed 18 July 2018, http://data.bioregionalassessments.gov.au/dataset/afed85e0-7819-493d-a847-ec00a318e657.

    Dataset Ancestors

  6. s

    Commuting Statistics - Dataset - data.smartdublin.ie

    • data.smartdublin.ie
    Updated Jan 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Commuting Statistics - Dataset - data.smartdublin.ie [Dataset]. https://data.smartdublin.ie/dataset/commuting-statistics
    Explore at:
    Dataset updated
    Jan 18, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    See more at: http://airo.maynoothuniversity.ie/datastore/commuting-t11-sa. This file contains original variables from Theme 11 Commuting from Census 2011 and a series of additional variables produced by AIRO such as percentage rates, ratios etc. The file includes data on Means of Travel, Time and Distance to Work, School or College for the 18,488 Small Areas in the Republic of Ireland. For more information on the original data source please see http://www.cso.ie/en/index.html -

  7. S

    2023 Census main means of travel to work by statistical area 3

    • datafinder.stats.govt.nz
    csv, dbf (dbase iii) +4
    Updated Jun 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stats NZ (2025). 2023 Census main means of travel to work by statistical area 3 [Dataset]. https://datafinder.stats.govt.nz/table/122496-2023-census-main-means-of-travel-to-work-by-statistical-area-3/
    Explore at:
    mapinfo mif, csv, dbf (dbase iii), geodatabase, mapinfo tab, geopackage / sqliteAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Statistics New Zealandhttp://www.stats.govt.nz/
    Authors
    Stats NZ
    License

    https://datafinder.stats.govt.nz/license/attribution-4-0-international/https://datafinder.stats.govt.nz/license/attribution-4-0-international/

    Description

    Dataset shows an individual’s statistical area 3 (SA3) of usual residence and the SA3 of their workplace address, for the employed census usually resident population count aged 15 years and over, by main means of travel to work from the 2018 and 2023 Censuses.

    The main means of travel to work categories are:

    • Work at home
    • Drive a private car, truck, or van
    • Drive a company car, truck, or van
    • Passenger in a car, truck, van, or company bus
    • Public bus
    • Train
    • Bicycle
    • Walk or jog
    • Ferry
    • Other.

    Main means of travel to work is the usual method which an employed person aged 15 years and over used to travel the longest distance to their place of work.

    Workplace address refers to where someone usually works in their main job, that is the job in which they worked the most hours. For people who work at home, this is the same address as their usual residence address. For people who do not work at home, this could be the address of the business they work for or another address, such as a building site.

    Workplace address is coded to the most detailed geography possible from the available information. This dataset only includes travel to work information for individuals whose workplace address is available at SA3 level. The sum of the counts for each region in this dataset may not equal the total employed census usually resident population count aged 15 years and over for that region. Workplace address – 2023 Census: Information by concept has more information.

    This dataset can be used in conjunction with the following spatial files by joining on the SA3 code values:

    Download data table using the instructions in the Koordinates help guide.

    Footnotes

    Geographical boundaries

    Statistical standard for geographic areas 2023 (updated December 2023) has information about geographic boundaries as of 1 January 2023. Address data from 2013 and 2018 Censuses was updated to be consistent with the 2023 areas. Due to the changes in area boundaries and coding methodologies, 2013 and 2018 counts published in 2023 may be slightly different to those published in 2013 or 2018.

    Subnational census usually resident population

    The census usually resident population count of an area (subnational count) is a count of all people who usually live in that area and were present in New Zealand on census night. It excludes visitors from overseas, visitors from elsewhere in New Zealand, and residents temporarily overseas on census night. For example, a person who usually lives in Christchurch city and is visiting Wellington city on census night will be included in the census usually resident population count of Christchurch city. 

    Population counts

    Stats NZ publishes a number of different population counts, each using a different definition and methodology. Population statistics – user guide has more information about different counts. 

    Caution using time series

    Time series data should be interpreted with care due to changes in census methodology and differences in response rates between censuses. The 2023 and 2018 Censuses used a combined census methodology (using census responses and administrative data).

    Workplace address time series

    Workplace address time series data should be interpreted with care at lower geographic levels, such as statistical area 2 (SA2). Methodological improvements in 2023 Census resulted in greater data accuracy, including a greater proportion of people being counted at lower geographic areas compared to the 2018 Census. Workplace address – 2023 Census: Information by concept has more information.

    Working at home

    In the census, working at home captures both remote work, and people whose business is at their home address (e.g. farmers or small business owners operating from their home). The census asks respondents whether they ‘mostly’ work at home or away from home. It does not capture whether someone does both, or how frequently they do one or the other.

    Rows excluded from the dataset

    Rows show SA3 of usual residence by SA3 of workplace address. Rows with a total population count of less than six have been removed to reduce the size of the dataset, given only a small proportion of SA3-SA3 combinations have commuter flows.

    About the 2023 Census dataset

    For information on the 2023 dataset see Using a combined census model for the 2023 Census. We combined data from the census forms with administrative data to create the 2023 Census dataset, which meets Stats NZ's quality criteria for population structure information. We added real data about real people to the dataset where we were confident the people who hadn’t completed a census form (which is known as admin enumeration) will be counted. We also used data from the 2018 and 2013 Censuses, administrative data sources, and statistical imputation methods to fill in some missing characteristics of people and dwellings.

    Data quality

    The quality of data in the 2023 Census is assessed using the quality rating scale and the quality assurance framework to determine whether data is fit for purpose and suitable for release. Data quality assurance in the 2023 Census has more information.

    Quality rating of a variable

    The quality rating of a variable provides an overall evaluation of data quality for that variable, usually at the highest levels of classification. The quality ratings shown are for the 2023 Census unless stated. There is variability in the quality of data at smaller geographies. Data quality may also vary between censuses, for subpopulations, or when cross tabulated with other variables or at lower levels of the classification. Data quality ratings for 2023 Census variables has more information on quality ratings by variable.

    Main means of travel to work quality rating

    Main means of travel to work is rated as moderate quality.

    Main means of travel to work – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Workplace address quality rating

    Workplace address is rated as moderate quality.

    Workplace address – 2023 Census: Information by concept has more information, for example, definitions and data quality.

    Using data for good

    Stats NZ expects that, when working with census data, it is done so with a positive purpose, as outlined in the Māori Data Governance Model (Data Iwi Leaders Group, 2023). This model states that "data should support transformative outcomes and should uplift and strengthen our relationships with each other and with our environments. The avoidance of harm is the minimum expectation for data use. Māori data should also contribute to iwi and hapū tino rangatiratanga”.

    Confidentiality

    The 2023 Census confidentiality rules have been applied to 2013, 2018, and 2023 data. These rules protect the confidentiality of individuals, families, households, dwellings, and undertakings in 2023 Census data. Counts are calculated using fixed random rounding to base 3 (FRR3) and suppression of ‘sensitive’ counts less than six, where tables report multiple geographic variables and/or small populations. Individual figures may not always sum to stated totals. Applying confidentiality rules to 2023 Census data and summary of changes since 2018 and 2013 Censuses has more information about 2023 Census confidentiality rules.

    Percentages

    To calculate percentages, divide the figure for the category of interest by the figure for ‘Total stated’ where this applies.

    Symbol

    -999 Confidential

    Inconsistencies in definitions

    Please note that there may be differences in definitions between census classifications and those used for other data collections.

  8. f

    Quantitative Research Methods and Data Analysis Workshop 2020

    • unisa.figshare.com
    pdf
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach (2025). Quantitative Research Methods and Data Analysis Workshop 2020 [Dataset]. http://doi.org/10.25399/UnisaData.12581483.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    University of South Africa
    Authors
    Tracy Probert; Maxine Schaefer; Anneke Carien Wilsenach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We include the course syllabus used to teach quantitative research design and analysis methods to graduate Linguistics students using a blended teaching and learning approach. The blended course took place over two weeks and builds on a face to face course presented over two days in 2019. Students worked through the topics in preparation for a live interactive video session each Friday to go through the activities. Additional communication took place on Slack for two hours each week. A survey was conducted at the start and end of the course to ascertain participants' perceptions of the usefulness of the course. The links to online elements and the evaluations have been removed from the uploaded course guide.Participants who complete this workshop will be able to:- outline the steps and decisions involved in quantitative data analysis of linguistic data- explain common statistical terminology (sample, mean, standard deviation, correlation, nominal, ordinal and scale data)- perform common statistical tests using jamovi (e.g. t-test, correlation, anova, regression)- interpret and report common statistical tests- describe and choose from the various graphing options used to display data- use jamovi to perform common statistical tests and graph resultsEvaluationParticipants who complete the course will use these skills and knowledge to complete the following activities for evaluation:- analyse the data for a project and/or assignment (in part or in whole)- plan the results section of an Honours research project (where applicable)Feedback and suggestions can be directed to M Schaefer schaemn@unisa.ac.za

  9. d

    Data Files for the Development of Regional Regression Equations for...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Data Files for the Development of Regional Regression Equations for Flow-Duration Statistics and N-day Low-Flow Frequencies for Ungaged Stream Locations in Puerto Rico Using Data Through Water Year 2018 [Dataset]. https://catalog.data.gov/dataset/data-files-for-the-development-of-regional-regression-equations-for-flow-duration-statisti
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Flow-duration statistics at the 99th, 98th, 95th, 90th, 80th, 70th, 60th, and 50th percent exceedance probabilities and annual n-day low-flow statistics for the 1-, 7-, 14-, and 30-day mean low flows with 2-year (0.5 nonexceedance probability), 5-year (0.2 nonexceedance probability), and 10-year (0.1 nonexceedance probability) recurrence intervals were computed for 28 selected streamflow gaging stations in Puerto Rico. The 28 selected streamflow gaging stations were required to have 10 or more years of daily mean streamflow data through water year 2018. The flow-duration statistics and n-day low-flow frequencies were computed using the U.S. Geological Survey program, SWToolbox. Regional regression equations were developed to estimate flow-duration statistics and n-day low-flow frequencies at ungaged stream locations using selected basin characteristics as explanatory variables. These variables were determined from digital spatial datasets and geographic information systems using the most recent data available, as referenced in the U.S. Geological Survey web application, StreamStats, and published in Kolb and Ryan (2021). An ordinary least-squares procedure in R Studio was used to develop the final regional flow-duration regression equations using drainage area, mean total annual reference evapotranspiration, and minimum basin elevation as the explanatory variables. A generalized least squares procedure in the U.S. Geological Survey program, WREG, was used to account for cross-correlation of sites and develop the final regional n-day low-flow frequency regression equations using drainage area, mean total annual reference evapotranspiration, and minimum basin elevation as the explanatory variables. This data release includes two child pages: Puerto Rico Flow-Duration Regression Files and Puerto Rico N-day Low-Flow Regression Files, a BasinCharacteristics.csv file that contains 47 basin and climatic characteristics considered in the analyses, a BasinCharacteristics_corrrelation_charts folder that contains .pdf files showing correlation matrices, a R_regsubsets_output folder that contains .txt and .pdf files showing results of the "regsubsets" analyses, a Trend_statistics_nday_lowflow_timeseries.csv file that contains the SWToolbox Mann-Kendall tau statistics, and a NWIS_rdb_files folder that contains the .rdb files used in the analyses. References Cited: Kolb, K.R., and Ryan, P.J., 2021, Basin Characteristic Rasters for Puerto Rico StreamStats, 2021: U.S. Geological Survey data release, https://doi.org/10.5066/P9HK9SSQ.

  10. Z

    Dates and Places of Birth and Death in the Biographical Dictionary of...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magistry, Pierre (2021). Dates and Places of Birth and Death in the Biographical Dictionary of Republican China [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4106718
    Explore at:
    Dataset updated
    Nov 23, 2021
    Dataset provided by
    Henriot, Christian
    Magistry, Pierre
    Armand, Cécile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the data on birth and death of the 589 historical figures in the Biographical Dictionary of Republican China. The file includes several tabs. The main data is located in the "data" tab. All the other tabs provide statistical analysis of the different variables included in the Data tab.

  11. d

    Supporting Data for Estimating Selected Low-Flow Frequency Statistics and...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Supporting Data for Estimating Selected Low-Flow Frequency Statistics and Mean Annual Flow for Ungaged Locations on Streams in Alabama (ver. 1.1, November 2020) [Dataset]. https://catalog.data.gov/dataset/supporting-data-for-estimating-selected-low-flow-frequency-statistics-and-mean-annual-flow
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    Streamflow data and statistics are vitally important for proper protection and management of both the water quality and water quantity of Alabama streams. Such data and statistics are available at U.S. Geological Survey streamflow-gaging stations, also referred to as streamgages or stations, but are often needed at ungaged stream locations. To address this need, the U.S. Geological Survey, in cooperation with numerous Alabama state agencies and organizations, developed regional regression equations for estimating selected low-flow frequency statistics and mean annual flow for ungaged locations in Alabama that are not substantially affected by tides, regulation, diversions, or other anthropogenic influences. This data release comprises the geographic information systems (GIS) layers and tabular data used to create the new low-flow and mean annual flow regression equations and implement them for the U.S. Geological Survey StreamStats application (https://streamstats.usgs.gov).

  12. Data from: Extreme Water Levels for Australian Beaches using Empirical...

    • data.csiro.au
    • researchdata.edu.au
    Updated Jun 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian O'Grady; Kathy McInnes; Mark Hemer; Ron Hoeke; Alec Stephenson; Frank Colberg; Claire Trenham (2021). Extreme Water Levels for Australian Beaches using Empirical Equations for Shoreline Wave Setup [Dataset]. http://doi.org/10.25919/5d1137055c162
    Explore at:
    Dataset updated
    Jun 25, 2021
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Julian O'Grady; Kathy McInnes; Mark Hemer; Ron Hoeke; Alec Stephenson; Frank Colberg; Claire Trenham
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1981 - Jan 1, 2013
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Description

    This data is associated with the paper O’Grady, J.G., McInnes, K.L., Hemer, M. A., Hoeke, R. K., Stephenson, A., and Colberg, F. (in press), "Extreme Water Levels for Australian Beaches using Empirical Equations for Shoreline Wave Setup", Journal of Geophysical Research: Oceans.

    Understanding how high ocean water levels can reach up the coast is important for designing coastal protection from coastal inundation and erosion. This is particularly important as climate change affects wind and weather conditions and sea-level rise with the subsequent modification to the occurrence of the largest storm-driven water levels. While the height of storm-driven water levels are well understood for protected harbours and estuaries, new research is providing estimates of how high water levels can reach for coastlines exposed to dangerous wave/surf conditions. This study uses mathematical model simulations spanning ~30 years of historical water levels and ocean waves. Statistical analysis is performed to determine how high the largest storm events will likely reach on natural sandy beaches directly exposed to large wave/surf conditions.

    The data comprises Gumbel distribution parameters from regression fitting to the hindcast model data. The file ST_rGUM_25m_sta.1981-2013.nc is for the storm-tide SWL heights from the ROMS storm surge hindcast. The file SU_GT81_rGUM_25m_sta.1981-2013.nc is for wave setup calculated with the Guza, R. T., & Thornton 1981 method.
    The file SU_GT81_ST_rGUM_25m_sta.1981-2013.nc is for the time-series combined storm-tide and wave setup.
    Notes: 1) The data datum is relative to the model bathymetry mean sea level (Geoscience Australia’s 2009 250m dataset). Haigh corrected their dataset of storm tide to AHD by comparing modelled 1-year ARI to the tide gauge measurements. “The predicted levels have been artificially adjusted so that the 1-year return period levels exactly match those of the measured estimates at each site. This was done because the predicted water levels are relative to MSL, whereas the measured levels are relative to AHD. Around mainland Australia, AHD was defined using MSL records between 1966 and 1968 at 30 sites and hence differs from present day MSL. Around Tasmania, AHD was defined using two records from 1972.” 2) To convert to AHD, the netcdf file ‘ST_rGUM_25m_sta.1981-2013.nc’ has a variable ‘toAHD’, you will need to add this onto the location parameter ‘mu’. Alternatively add it to the predicted return levels. 3) Wave setup is really only valid for open coastlines exposed to waves, so be careful applying it in estuaries. Lineage: Created with R's ismev Gumbel function on selected datasets (ROMS storm surge hindcast, CAWCR wave hindcast, and combined data).

  13. Statistics of Non-means-tested Loan Scheme | DATA.GOV.HK

    • data.gov.hk
    Updated Apr 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2022). Statistics of Non-means-tested Loan Scheme | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-wfsfaa-sfo_01-nls-stats
    Explore at:
    Dataset updated
    Apr 27, 2022
    Dataset provided by
    data.gov.hk
    Description

    Provides the statistics about Non-means-tested Loan Scheme

  14. INCOME Total and Mean and Median Household Income in 1999 NMHD 2000

    • catalog.data.gov
    • gstore.unm.edu
    Updated Dec 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Geographic Products Management Branch (Point of Contact) (2020). INCOME Total and Mean and Median Household Income in 1999 NMHD 2000 [Dataset]. https://catalog.data.gov/dataset/income-total-and-mean-and-median-household-income-in-1999-nmhd-2000
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    The 2006 Second Edition TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER database. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on the latest available governmental unit boundaries. The Census TIGER database represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The 2006 Second Edition TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. This shapefile represents the current State House Districts for New Mexico as posted on the Census Bureau website for 2006.

  15. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  16. l

    Crime Statistics Agency Data Tables - Location Type Classifications

    • devweb.dga.links.com.au
    • data.gov.au
    xlsx
    Updated May 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crime Statistics Agency (2025). Crime Statistics Agency Data Tables - Location Type Classifications [Dataset]. https://devweb.dga.links.com.au/data/dataset/crime-statistics-agency-data-tables-location-type-classifications
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset authored and provided by
    Crime Statistics Agency
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    The Crime Statistics Agency (CSA) is responsible for processing, analysing and publishing Victorian crime statistics, independent of Victoria Police. The CSA aims to provide an efficient and transparent information service to assist and inform policy makers, researchers and the Victorian public. The legal basis for the Crime Statistics Agency is the Crime Statistics Act 2014, which provides for the publication and release of crime statistics, research into crime trends, and the employment of a Chief Statistician for that purpose. Under the provisions of the Act, the Chief Statistician is empowered to receive law enforcement data from the Chief Commissioner of Police and is responsible for publishing and releasing statistical information relating to crime in Victoria. The Crime Statistics Agency publishes location type data for all offences where a location type was recorded by Victoria Police. There are three main location types; Residential, Community and Other. These types are further broken down into Subdivisions which show an intermediate level of information, and further into Groups which show a finer level of detail Data Classification - https://www.crimestatistics.vic.gov.au/about-the-data/classifications Glossary and Data Dictionary - https://www.crimestatistics.vic.gov.au/about-the-data/glossary-and-data-dictionary

  17. f

    Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS:...

    • frontiersin.figshare.com
    zip
    Updated Jun 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Loffing (2023). Data_Sheet_1_Raw Data Visualization for Common Factorial Designs Using SPSS: A Syntax Collection and Tutorial.ZIP [Dataset]. http://doi.org/10.3389/fpsyg.2022.808469.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Florian Loffing
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Transparency in data visualization is an essential ingredient for scientific communication. The traditional approach of visualizing continuous quantitative data solely in the form of summary statistics (i.e., measures of central tendency and dispersion) has repeatedly been criticized for not revealing the underlying raw data distribution. Remarkably, however, systematic and easy-to-use solutions for raw data visualization using the most commonly reported statistical software package for data analysis, IBM SPSS Statistics, are missing. Here, a comprehensive collection of more than 100 SPSS syntax files and an SPSS dataset template is presented and made freely available that allow the creation of transparent graphs for one-sample designs, for one- and two-factorial between-subject designs, for selected one- and two-factorial within-subject designs as well as for selected two-factorial mixed designs and, with some creativity, even beyond (e.g., three-factorial mixed-designs). Depending on graph type (e.g., pure dot plot, box plot, and line plot), raw data can be displayed along with standard measures of central tendency (arithmetic mean and median) and dispersion (95% CI and SD). The free-to-use syntax can also be modified to match with individual needs. A variety of example applications of syntax are illustrated in a tutorial-like fashion along with fictitious datasets accompanying this contribution. The syntax collection is hoped to provide researchers, students, teachers, and others working with SPSS a valuable tool to move towards more transparency in data visualization.

  18. g

    Administrative data within the meaning of the Law of Ukraine "On State...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administrative data within the meaning of the Law of Ukraine "On State Statistics", collected (processed) and subject to disclosure in accordance with the requirements of the law by the information manager | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_b53fd025-76f0-4a70-813b-8a1712c27601
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Ukraine
    Description

    The set contains administrative data created by the Department of Social Policy of the Kamyanka City Council during the performance of administrative duties and tasks assigned to its competence and provided by the Law of Ukraine "On State Statistics", the order of the State Statistics Service of 10.06.2016 Ü90 "On Approval of Forms of State Statistical Surveillance Ü1-PV (monthly) and Ü1-PV (quarterly) "Report on Labor", and which are subject to publication in accordance with the resolution of the Cabinet of Ministers of Ukraine of 21.10.2015

  19. D

    DQS NHANES Select Mean Dietary Intake Estimates

    • data.cdc.gov
    • healthdata.gov
    • +1more
    application/rdfxml +5
    Updated Mar 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NCHS/DHANES (2024). DQS NHANES Select Mean Dietary Intake Estimates [Dataset]. https://data.cdc.gov/National-Center-for-Health-Statistics/DQS-NHANES-Select-Mean-Dietary-Intake-Estimates/xcc8-2jrh
    Explore at:
    xml, csv, json, tsv, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Mar 4, 2024
    Dataset authored and provided by
    NCHS/DHANES
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    These data represent mean intake, on a given day, estimates of nutrients from foods and beverages from the National Health and Nutrition Examination Survey (NHANES). Search, visualize, and download these and other estimates from over 120 health topics with the NCHS Data Query System (DQS), available from: https://www.cdc.gov/nchs/dataquery/index.htm.

  20. Core Based Statistical Areas

    • catalog.data.gov
    • data-usdot.opendata.arcgis.com
    • +1more
    Updated Aug 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau (USCB) (Point of Contact) (2024). Core Based Statistical Areas [Dataset]. https://catalog.data.gov/dataset/core-based-statistical-areas1
    Explore at:
    Dataset updated
    Aug 21, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    The Core Based Statistical Areas boundaries were defined by OMB based on the 2010 Census, and the dataset was updated on August 09, 2019 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. Metropolitan and Micropolitan Statistical Areas are together termed Core Based Statistical Areas (CBSAs) and are defined by the Office of Management and Budget (OMB) and consist of the county or counties or equivalent entities associated with at least one urban core (urbanized area or urban cluster) of at least 10,000 population, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties with the counties containing the core. Categories of CBSAs are: Metropolitan Statistical Areas, based on urbanized areas of 50,000 or more population; and Micropolitan Statistical Areas, based on urban clusters of at least 10,000 population but less than 50,000 population. The CBSA boundaries are those defined by OMB based on the 2010 Census, published in 2013, and updated in 2018.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nicola Loperfido; Thaddeus Tarpey (2023). Dataset for: Some Remarks on the R2 for Clustering [Dataset]. http://doi.org/10.6084/m9.figshare.6124508.v1

Dataset for: Some Remarks on the R2 for Clustering

Related Article
Explore at:
txtAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Wiley
Authors
Nicola Loperfido; Thaddeus Tarpey
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.

Search
Clear search
Close search
Google apps
Main menu