100+ datasets found
  1. Data analytics tools in use by organizations in the United States 2015-2017

    • statista.com
    Updated Dec 1, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2015). Data analytics tools in use by organizations in the United States 2015-2017 [Dataset]. https://www.statista.com/statistics/500119/united-states-survey-use-data-analytics-tools/
    Explore at:
    Dataset updated
    Dec 1, 2015
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2015
    Area covered
    United States
    Description

    The statistic shows the analytics tools currently in use by business organizations in the United States, as well as the analytics tools respondents believe they will be using in two years, according to a 2015 survey conducted by the Harvard Business Review Analytics Service. As of 2015, ** percent of respondents believed they were going to use predictive analytics for data analysis in two years' time.

  2. d

    Replication Data for: The Statistical Analysis of Misreporting on Sensitive...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eady, Gregory (2023). Replication Data for: The Statistical Analysis of Misreporting on Sensitive Survey Questions [Dataset]. http://doi.org/10.7910/DVN/PZKBUX
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Eady, Gregory
    Description

    Replication data for the article Eady, Gregory (2016) "The Statistical Analysis of Misreporting on Sensitive Survey Questions"

  3. d

    Topographic channel survey data for selected new culvert installation sites...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Topographic channel survey data for selected new culvert installation sites in the East Gulf Coastal Plain of Alabama [Dataset]. https://catalog.data.gov/dataset/topographic-channel-survey-data-for-selected-new-culvert-installation-sites-in-the-east-gu
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Gulf Coastal Plain, Alabama
    Description

    As part of a cooperative study by the Alabama Department of Transportation (ALDOT) and the U.S. Geological Survey, topographic surveys of stream channel cross-sections, upstream and downstream of selected new culvert installations in the East Gulf Coastal Plain of Alabama, were conducted both before and after culvert construction. This dataset contains raw topographic data used to determine channel measurements for statistical analysis in the associated Scientific Investigations Report, Effects of culvert construction on streams in the East Gulf Coastal Plain of Alabama, 2010-19 (Pugh and Gill, 2021). To measure the effects that culvert construction may have on stream channel beds, banks, and slopes, topographic surveys of 22 stream channel cross-sections, 11 upstream and 11 downstream, of the proposed culvert, were conducted at each study site before culvert construction. The cross-sections were evenly distributed along a stream reach length of approximately 20 times the channel width. These same cross-sections were resurveyed approximately 2 years after culvert construction was completed. Pre- and post-construction channel geometry data are presented in separate Comma-Separated Values (CSV) files. In addition, a plan view of before and after construction data points is included in a Portable Document Format (PDF) file.

  4. Energy Consumption in Transport Survey 2014, Main Results - West Bank and...

    • pcbs.gov.ps
    Updated Dec 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of Statistics (2021). Energy Consumption in Transport Survey 2014, Main Results - West Bank and Gaza [Dataset]. https://www.pcbs.gov.ps/PCBS-Metadata-en-v5.2/index.php/catalog/699
    Explore at:
    Dataset updated
    Dec 12, 2021
    Dataset authored and provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Time period covered
    2015
    Area covered
    Palestine, West Bank
    Description

    Abstract

    Most countries collect official statistics on energy use due to its vital role in the infrastructure, economy and living standards.

    In Palestine, additional attention is warranted for energy statistics due to a scarcity of natural resources, the high cost of energy and high population density. These factors demand comprehensive and high quality statistics.

    In this contest PCBS decided to conduct a special Energy Consumption in Transport Survey to provide high quality data about energy consumption by type, expenditure on maintenance and insurance for vehicles, and questions on vehicles motor capacity and year of production.

    The survey aimed to provide data on energy consumption by transport sector and also on the energy consumption by the type of vehicles and its motor capacity and year of production.

    Geographic coverage

    Palestine

    Analysis unit

    Vehicles

    Universe

    All the operating vehicles in Palestine in 2014.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Target Population: All the operating vehicles in Palestine in 2014.

    2.1Sample Frame A list of the number of the operating vehicles in Palestine in 2014, they are broken down by governorates and vehicle types, this list was obtained from Ministry of transport.

    2.2.1 Sample size The sample size is 6,974 vehicles.

    2.2.2 Sampling Design it is stratified random sample, and in some of the small size strata the quota sample was used to cover them.

    The method of reaching the vehicles sample was through : 1-reaching to all the dynamometers (the centers for testing the vehicles) 2-selecting a random sample of vehicles by type of vehicle, model, fuel type and engine capacity

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The design of the questionnaire was based on the experiences of other similar countries in energy statistics subject to cover the most important indicators for energy statistics in transport sector, taking into account Palestine's particular situation.

    Cleaning operations

    The data processing stage consisted of the following operations: Editing and coding prior to data entry: all questionnaires were edited and coded in the office using the same instructions adopted for editing in the field.

    Data entry: The survey questionnaire was uploaded on office computers. At this stage, data were entered into the computer using a data entry template developed in Access Database. The data entry program was prepared to satisfy a number of requirements: ·To prevent the duplication of questionnaires during data entry. ·To apply checks on the integrity and consistency of entered data. ·To handle errors in a user friendly manner. ·The ability to transfer captured data to another format for data analysis using statistical analysis software such as SPSS. Audit after data entered at this stage is data entered scrutiny by pulling the data entered file periodically and review the data and examination of abnormal values and check consistency between the different questions in the questionnaire, and if there are any errors in the data entered to be the withdrawal of the questionnaire and make sure this data and adjusted, even been getting the final data file that is the final extract data from it. Extraction Results: The extract final results of the report by using the SPSS program, and then display the results through tables to Excel format.

    Response rate

    80.7%

    Sampling error estimates

    Data of this survey may be affected by sampling errors due to use of a sample and not a complete enumeration. Therefore, certain differences are anticipated in comparison with the real values obtained through censuses. The variance was calculated for the most important indicators: the variance table is attached with the final report. There is no problem in the dissemination of results at national and regional level (North, Middle, South of West Bank, Gaza Strip).

    Data appraisal

    The survey sample consisted of around 6,974 vehicles, of which 5,631 vehicles completed the questionnaire, 3,652 vehicles from the West Bank and 1,979 vehicles in Gaza Strip.

  5. T

    Community survey and development survey data (2018-2022)

    • data.tpdc.ac.cn
    • tpdc.ac.cn
    zip
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xukun SU (2025). Community survey and development survey data (2018-2022) [Dataset]. http://doi.org/10.11888/HumanNat.tpdc.302834
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2025
    Dataset provided by
    TPDC
    Authors
    Xukun SU
    Area covered
    Description

    This data covers various elements of communities in nature reserves on the Qinghai Tibet Plateau. The data mainly comes from field research, obtained through questionnaire surveys, interviews, and community statistical data collection in multiple natural protected areas on the Qinghai Tibet Plateau. The questionnaire survey covers different age and occupational groups to ensure sample representativeness; The interviewees include community managers, resident representatives, ecological protection workers, etc. In terms of data processing, the first step is to clean up the raw data, removing invalid and erroneous data; Then use statistical analysis software to perform classification and correlation analysis. In terms of application results, based on this data, the current situation and problems of community development in protected areas were deeply analyzed, providing a basis for formulating ecological compensation policies and planning sustainable industries. Some protected areas have adjusted their development strategies according to the data results, achieving economic and ecological benefits improvement.

  6. Ad hoc statistical analysis 2021/22: Quarter 2

    • gov.uk
    • s3.amazonaws.com
    Updated Sep 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Digital, Culture, Media & Sport (2021). Ad hoc statistical analysis 2021/22: Quarter 2 [Dataset]. https://www.gov.uk/government/statistical-data-sets/ad-hoc-statistical-analysis-202122-quarter-2
    Explore at:
    Dataset updated
    Sep 10, 2021
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Digital, Culture, Media & Sport
    Description

    This page lists ad hoc statistics released during the period July-September 2021. These are additional analyses not included in any of the Department for Digital, Culture, Media and Sport’s standard publications.

    If you would like any further information please contact evidence@dcms.gov.uk

    September 2021 - Ad Hoc UK Business Data Survey Release

    This analysis provides estimates of data use amongst UK organisations, using the UK Business Survey (UKBDS). This accompanies analysis within the consultation for UK data reform. This is an abridged set of specific findings from the UKBDS, a telephone-based quantitative and qualitative study of UK businesses, which seeks to understand the role and importance of personal and non-personal data in UK businesses, domestic and international transfers of data, and the awareness of, and attitudes toward, data protection legislation and policy.

  7. Technologies used in big data analysis 2015

    • statista.com
    Updated Jul 29, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2015). Technologies used in big data analysis 2015 [Dataset]. https://www.statista.com/statistics/491267/big-data-technologies-used/
    Explore at:
    Dataset updated
    Jul 29, 2015
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Dec 2014 - Feb 2015
    Area covered
    Europe, North America, Worldwide
    Description

    This graph presents the results of a survey, conducted by BARC in 2014/15, into the current and planned use of technology for the analysis of big data. At the beginning of 2015, ** percent of respondents indicated that their company was already using a big data analytical appliance for big data.

  8. f

    Summary statistics for raw data.

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. Christopher Westland (2023). Summary statistics for raw data. [Dataset]. http://doi.org/10.1371/journal.pone.0271949.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOS ONE
    Authors
    J. Christopher Westland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary statistics for raw data.

  9. Expenditure and Consumption Survey, 2004 - West Bank and Gaza

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +1more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2004 - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/index.php/catalog/3085
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
    Time period covered
    2004 - 2005
    Area covered
    Palestine, West Bank
    Description

    Abstract

    The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

    The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

    Geographic coverage

    The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

    Analysis unit

    1- Household/families. 2- Individuals.

    Universe

    The survey covered all the Palestinian households who are a usual residence in the Palestinian Territory.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sample and Frame:

    The sampling frame consists of all enumeration areas which were enumerated in 1997; the enumeration area consists of buildings and housing units and is composed of an average of 120 households. The enumeration areas were used as Primary Sampling Units (PSUs) in the first stage of the sampling selection. The enumeration areas of the master sample were updated in 2003.

    Sample Design:

    The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 299 enumeration areas. Second stage: selection of a systematic random sample of 12-18 households from each enumeration area selected in the first stage. A person (18 years and more) was selected from each household in the second stage.

    Sample strata:

    The population was divided by: 1- Governorate 2- Type of Locality (urban, rural, refugee camps)

    Sample Size:

    The calculated sample size is 3,781 households.

    Target cluster size:

    The target cluster size or "sample-take" is the average number of households to be selected per PSU. In this survey, the sample take is around 12 households.

    Detailed information/formulas on the sampling design are available in the user manual.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The PECS questionnaire consists of two main sections:

    First section: Certain articles / provisions of the form filled at the beginning of the month,and the remainder filled out at the end of the month. The questionnaire includes the following provisions:

    Cover sheet: It contains detailed and particulars of the family, date of visit, particular of the field/office work team, number/sex of the family members.

    Statement of the family members: Contains social, economic and demographic particulars of the selected family.

    Statement of the long-lasting commodities and income generation activities: Includes a number of basic and indispensable items (i.e, Livestock, or agricultural lands).

    Housing Characteristics: Includes information and data pertaining to the housing conditions, including type of shelter, number of rooms, ownership, rent, water, electricity supply, connection to the sewer system, source of cooking and heating fuel, and remoteness/proximity of the house to education and health facilities.

    Monthly and Annual Income: Data pertaining to the income of the family is collected from different sources at the end of the registration / recording period.

    Second section: The second section of the questionnaire includes a list of 54 consumption and expenditure groups itemized and serially numbered according to its importance to the family. Each of these groups contains important commodities. The number of commodities items in each for all groups stood at 667 commodities and services items. Groups 1-21 include food, drink, and cigarettes. Group 22 includes homemade commodities. Groups 23-45 include all items except for food, drink and cigarettes. Groups 50-54 include all of the long-lasting commodities. Data on each of these groups was collected over different intervals of time so as to reflect expenditure over a period of one full year.

    Cleaning operations

    Raw Data

    Both data entry and tabulation were performed using the ACCESS and SPSS software programs. The data entry process was organized in 6 files, corresponding to the main parts of the questionnaire. A data entry template was designed to reflect an exact image of the questionnaire, and included various electronic checks: logical check, range checks, consistency checks and cross-validation. Complete manual inspection was made of results after data entry was performed, and questionnaires containing field-related errors were sent back to the field for corrections.

    Harmonized Data

    • The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.
    • The harmonization process starts with cleaning all raw data files received from the Statistical Office.
    • Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.
    • A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.
    • A post-harmonization cleaning process is run on the data.
    • Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

    Response rate

    The survey sample consists of about 3,781 households interviewed over a twelve-month period between January 2004 and January 2005. There were 3,098 households that completed the interview, of which 2,060 were in the West Bank and 1,038 households were in GazaStrip. The response rate was 82% in the Palestinian Territory.

    Sampling error estimates

    The calculations of standard errors for the main survey estimations enable the user to identify the accuracy of estimations and the survey reliability. Total errors of the survey can be divided into two kinds: statistical errors, and non-statistical errors. Non-statistical errors are related to the procedures of statistical work at different stages, such as the failure to explain questions in the questionnaire, unwillingness or inability to provide correct responses, bad statistical coverage, etc. These errors depend on the nature of the work, training, supervision, and conducting all various related activities. The work team spared no effort at different stages to minimize non-statistical errors; however, it is difficult to estimate numerically such errors due to absence of technical computation methods based on theoretical principles to tackle them. On the other hand, statistical errors can be measured. Frequently they are measured by the standard error, which is the positive square root of the variance. The variance of this survey has been computed by using the “programming package” CENVAR.

  10. H

    Survey of Income and Program Participation (SIPP)

    • dataverse.harvard.edu
    Updated May 30, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Damico (2013). Survey of Income and Program Participation (SIPP) [Dataset]. http://doi.org/10.7910/DVN/I0FFJV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 30, 2013
    Dataset provided by
    Harvard Dataverse
    Authors
    Anthony Damico
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    analyze the survey of income and program participation (sipp) with r if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp). it's giant. it's rich with variables. it's monthly. it follows households over three, four, now five year panels. the congressional budget office uses it for their health insurance simulation . analysts read that sipp has person-month files, get scurred, and retreat to inferior options. the american community survey may be the mount everest of survey data, but sipp is most certainly the amazon. questions swing wild and free through the jungle canopy i mean core data dictionary. legend has it that there are still species of topical module variables that scientists like you have yet to analyze. ponce de león would've loved it here. ponce. what a name. what a guy. the sipp 2008 panel data started from a sample of 105,663 individuals in 42,030 households. once the sample gets drawn, the census bureau surveys one-fourth of the respondents every four months, over f our or five years (panel durations vary). you absolutely must read and understand pdf pages 3, 4, and 5 of this document before starting any analysis (start at the header 'waves and rotation groups'). if you don't comprehend what's going on, try their survey design tutorial. since sipp collects information from respondents regarding every month over the duration of the panel, you'll need to be hyper-aware of whether you want your results to be point-in-time, annualized, or specific to some other period. the analysis scripts below provide examples of each. at every four-month interview point, every respondent answers every core question for the previous four months. after that, wave-specific addenda (called topical modules) get asked, but generally only regarding a single prior month. to repeat: core wave files contain four records per person, topical modules contain one. if you stacked every core wave, you would have one record per person per month for the duration o f the panel. mmmassive. ~100,000 respondents x 12 months x ~4 years. have an analysis plan before you start writing code so you extract exactly what you need, nothing more. better yet, modify something of mine. cool? this new github repository contains eight, you read me, eight scripts: 1996 panel - download and create database.R 2001 panel - download and create database.R 2004 panel - download and create database.R 2008 panel - download and create database.R since some variables are character strings in one file and integers in anoth er, initiate an r function to harmonize variable class inconsistencies in the sas importation scripts properly handle the parentheses seen in a few of the sas importation scripts, because the SAScii package currently does not create an rsqlite database, initiate a variant of the read.SAScii function that imports ascii data directly into a sql database (.db) download each microdata file - weights, topical modules, everything - then read 'em into sql 2008 panel - full year analysis examples.R< br /> define which waves and specific variables to pull into ram, based on the year chosen loop through each of twelve months, constructing a single-year temporary table inside the database read that twelve-month file into working memory, then save it for faster loading later if you like read the main and replicate weights columns into working memory too, merge everything construct a few annualized and demographic columns using all twelve months' worth of information construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half, again save it for faster loading later, only if you're so inclined reproduce census-publish ed statistics, not precisely (due to topcoding described here on pdf page 19) 2008 panel - point-in-time analysis examples.R define which wave(s) and specific variables to pull into ram, based on the calendar month chosen read that interview point (srefmon)- or calendar month (rhcalmn)-based file into working memory read the topical module and replicate weights files into working memory too, merge it like you mean it construct a few new, exciting variables using both core and topical module questions construct a replicate-weighted complex sample design with a fay's adjustment factor of one-half reproduce census-published statistics, not exactly cuz the authors of this brief used the generalized variance formula (gvf) to calculate the margin of error - see pdf page 4 for more detail - the friendly statisticians at census recommend using the replicate weights whenever possible. oh hayy, now it is. 2008 panel - median value of household assets.R define which wave(s) and spe cific variables to pull into ram, based on the topical module chosen read the topical module and replicate weights files into working memory too, merge once again construct a replicate-weighted complex sample design with a...

  11. A dataset from a survey investigating disciplinary differences in data...

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin, csv, pdf, txt
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. http://doi.org/10.5281/zenodo.7853477
    Explore at:
    txt, pdf, bin, csvAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anton Boudreau Ninkov; Anton Boudreau Ninkov; Chantal Ripp; Chantal Ripp; Kathleen Gregory; Kathleen Gregory; Isabella Peters; Isabella Peters; Stefanie Haustein; Stefanie Haustein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENERAL INFORMATION

    Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

    Date of data collection: January to March 2022

    Collection instrument: SurveyMonkey

    Funding: Alfred P. Sloan Foundation


    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

    Links to publications that cite or use the data:

    Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

    Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
    A survey investigating disciplinary differences in data citation.
    Zenodo. https://doi.org/10.5281/zenodo.7555266


    DATA & FILE OVERVIEW

    File List

    • Filename: MDCDatacitationReuse2021Codebookv2.pdf
      Codebook
    • Filename: MDCDataCitationReuse2021surveydatav2.csv
      Dataset format in csv
    • Filename: MDCDataCitationReuse2021surveydatav2.sav
      Dataset format in SPSS
    • Filename: MDCDataCitationReuseSurvey2021QNR.pdf
      Questionnaire

    Additional related data collected that was not included in the current data package: Open ended questions asked to respondents


    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:

    The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

    Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

    Methods for processing the data:

    Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

    Instrument- or software-specific information needed to interpret the data:

    The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.


    DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

    Number of variables: 95

    Number of cases/rows: 2,492

    Missing data codes: 999 Not asked

    Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.

  12. m

    Austin_Survey_for_MDCOR_Analyses

    • data.mendeley.com
    Updated Nov 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manuel Gonzalez Canche (2022). Austin_Survey_for_MDCOR_Analyses [Dataset]. http://doi.org/10.17632/nb7yvhjvzk.1
    Explore at:
    Dataset updated
    Nov 14, 2022
    Authors
    Manuel Gonzalez Canche
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The city of Austin has administered a community survey for the 2015, 2016, 2017, 2018 and 2019 years (https://data.austintexas.gov/City-Government/Community-Survey/s2py-ceb7), to “assess satisfaction with the delivery of the major City Services and to help determine priorities for the community as part of the City’s ongoing planning process.” To directly access this dataset from the city of Austin’s website, you can follow this link https://cutt.ly/VNqq5Kd. Although we downloaded the dataset analyzed in this study from the former link, given that the city of Austin is interested in continuing administering this survey, there is a chance that the data we used for this analysis and the data hosted in the city of Austin’s website may differ in the following years. Accordingly, to ensure the replication of our findings, we recommend researchers to download and analyze the dataset we employed in our analyses, which can be accessed at the following link https://github.com/democratizing-data-science/MDCOR/blob/main/Community_Survey.csv. Replication Features or Variables The community survey data has 10,684 rows and 251 columns. Of these columns, our analyses will rely on the following three indicators that are taken verbatim from the survey: “ID”, “Q25 - If there was one thing you could share with the Mayor regarding the City of Austin (any comment, suggestion, etc.), what would it be?", and “Do you own or rent your home?”

  13. Big Data Analysis

    • ine.es
    csv, html, json +4
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INE - Instituto Nacional de Estadística (2025). Big Data Analysis [Dataset]. https://www.ine.es/jaxi/Tabla.htm?tpx=53911&L=1
    Explore at:
    txt, csv, xlsx, json, text/pc-axis, xls, htmlAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset provided by
    National Statistics Institutehttp://www.ine.es/
    Authors
    INE - Instituto Nacional de Estadística
    License

    https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal

    Variables measured
    Main variables, Size of the enterprise, Activity grouping (except CNAE 56, 64-66 and 95.1)
    Description

    Survey on the Use of Information and Communication Technologies and Electronic Commerce in Companies: Big Data Analysis. National.

  14. f

    Initial data analysis checklist for data screening in longitudinal studies.

    • plos.figshare.com
    xls
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner (2024). Initial data analysis checklist for data screening in longitudinal studies. [Dataset]. http://doi.org/10.1371/journal.pone.0295726.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 29, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Lara Lusa; Cécile Proust-Lima; Carsten O. Schmidt; Katherine J. Lee; Saskia le Cessie; Mark Baillie; Frank Lawrence; Marianne Huebner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Initial data analysis checklist for data screening in longitudinal studies.

  15. w

    Multiple Indicator Cluster Survey 2006 - Viet Nam

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +2more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social and Environmental Statistics Department (2023). Multiple Indicator Cluster Survey 2006 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/31
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    Social and Environmental Statistics Department
    Time period covered
    2006
    Area covered
    Vietnam
    Description

    Abstract

    The Multiple Indicator Cluster Survey (MICS) is a household survey programme developed by UNICEF to assist countries in filling data gaps for monitoring human development in general and the situation of children and women in particular. MICS is capable of producing statistically sound, internationally comparable estimates of social indicators. The Viet Nam Multiple Indicator Cluster Survey provides valuable information on the situation of children and women in Viet Nam, and was based, in large part, on the needs to monitor progress towards goals and targets emanating from recent international agreements: the Millennium Declaration, adopted by all 191 United Nations Member States in September 2000, and the Plan of Action of A World Fit For Children, adopted by 189 Member States at the United Nations Special Session on Children in May 2002. Both of these commitments build upon promises made by the international community at the 1990 World Summit for Children.

    Survey Objectives: The 2006 Viet Nam Multiple Indicator Cluster Survey has as its primary objectives: - To provide up-to-date information for assessing the situation of children and women in Viet Nam; - To furnish data needed for monitoring progress toward goals established by the Millennium Development Goals, the goals of A World Fit For Children (WFFC), and other internationally agreed upon goals, as a basis for future action; - To provide valuable information for the 3rd and 4th National Report of Vietnam's implementation of the Convention on the child rights in the period 2002-2007 as well as for monitoring the National Plan of Action for Children 2001-2010.
    - To contribute to the improvement of data and monitoring systems in Viet Nam and to strengthen technical expertise in the design, implementation, and analysis of such systems.

    Survey Content Following the MICS global questionnaire templates, the questionnaires were designed in a modular fashion customized to the needs of Viet Nam. The questionnaires consist of a household questionnaire, a questionnaire for women aged 15-49 and a questionnaire for children under the age of five (to be administered to the mother or caretaker).

    Survey Implementation The Viet Nam Multiple Indicator Cluster Survey (MICS) was carried by General Statistics Office of Viet Nam (GSO) in collaboration with Viet Nam Committee for Population, Family and Children (VCPFC). Financial and technical support was provided by the United Nations Children's Fund (UNICEF). Technical assistance and training for the survey was provided through a series of regional workshops organised by UNICEF covering questionnaire content, sampling and survey implementation; data processing; data quality and data analysis; report writing and dissemination.

    Geographic coverage

    The survey is nationally representative and covers the whole of Viet Nam.

    Analysis unit

    Households (defined as a group of persons who usually live and eat together)

    Household members (defined as members of the household who usually live in the household, which may include people who did not sleep in the household the previous night, but does not include visitors who slept in the household the previous night but do not usually live in the household)

    Women aged 15-49

    Children aged 0-4

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49 years resident in the household, and all children aged 0-4 years (under age 5) resident in the household.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for the Viet Nam Multiple Indicator Cluster Survey (MICS) was designed to provide reliable estimates on a large number of indicators on the situation of children and women at the national level, for urban and rural areas, and for 8 regions: Red River Delta, North West, North East, North Central Coast, South Central Coast, Central Highlands, South East, and Mekong River Delta. Regions were identified as the main sampling domains and the sample was selected in two stages. At the first stage 250 census enumeration areas (EA) were selected, of which all 240 EAs of MICS2 with systematic method were reselected and 10 new EAs were added. The addition of 10 more EAs (together with the increase in the sample size) was to increase the reliability level for regional estimates. Consequently, within each region, 30-33 EAs were selected for MICS3. After a household listing was carried out within the selected enumeration areas, a systematic sample of 1/3 of households in each EA was drawn. The survey managed to visit all of 250 selected EAs during the fieldwork period. The sample was stratified by region and is not self-weighting. For reporting national level results, sample weights are used. A more detailed description of the sample design can be found in the technical documents and in Appendix A of the final report.

    Sampling deviation

    No major deviations from the original sample design were made. All sample enumeration areas were accessed and successfully interviewed with good response rates.

    Mode of data collection

    Face-to-face

    Research instrument

    The questionnaires are based on the MICS3 model questionnaire. From the MICS3 model English version, the questionnaires were translated in to Vietnamese and were pretested in one province (Bac Giang) during July 2006. Based on the results of this pre-test, modifications were made to the wording and translation of the questionnaires.

    Cleaning operations

    Data editing took place at a number of stages throughout the processing (see Other processing), including: a) Office editing and coding b) During data entry c) Structure checking and completeness d) Secondary editing e) Structural checking of SPSS data files

    Detailed documentation of the editing of data can be found in the data processing guidelines in the MICS manual http://www.childinfo.org/mics/mics3/manual.php.

    Response rate

    8356 households were selected for the sample. Of these all were found to be occupied households and 8355 were successfully interviewed for a response rate of 100%. Within these households, 10063 eligible women aged 15-49 were identified for interview, of which 9473 were successfully interviewed (response rate 94.1%), and 2707 children aged 0-4 were identified for whom the mother or caretaker was successfully interviewed for 2680 children (response rate 99%).

    Sampling error estimates

    Estimates from a sample survey are affected by two types of errors: 1) non-sampling errors and 2) sampling errors. Non-sampling errors are the results of mistakes made in the implementation of data collection and data processing. Numerous efforts were made during implementation of the MICS - 3 to minimize this type of error, however, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors can be evaluated statistically. The sample of respondents to the MICS - 3 is only one of many possible samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that different somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability in the results of the survey between all possible samples, and, although, the degree of variability is not known exactly, it can be estimated from the survey results. The sampling errors are measured in terms of the standard error for a particular statistic (mean or percentage), which is the square root of the variance. Confidence intervals are calculated for each statistic within which the true value for the population can be assumed to fall. Plus or minus two standard errors of the statistic is used for key statistics presented in MICS, equivalent to a 95 percent confidence interval.

    If the sample of respondents had been a simple random sample, it would have been possible to use straightforward formulae for calculating sampling errors. However, the MICS - 3 sample is the result of a two-stage stratified design, and consequently needs to use more complex formulae. The SPSS complex samples module has been used to calculate sampling errors for the MICS - 3. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. This method is documented in the SPSS file CSDescriptives.pdf found under the Help, Algorithms options in SPSS.

    Sampling errors have been calculated for a select set of statistics (all of which are proportions due to the limitations of the Taylor linearization method) for the national sample, urban and rural areas, and for each of the five regions. For each statistic, the estimate, its standard error, the coefficient of variation (or relative error -- the ratio between the standard error and the estimate), the design effect, and the square root design effect (DEFT -- the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used), as well as the 95 percent confidence intervals (+/-2 standard errors).

    Data appraisal

    A series of data quality tables and graphs are available to review the quality of the data and include the following:

    Age distribution of the household population Age distribution of eligible women and interviewed women Age distribution of eligible children and children for whom the mother or caretaker was interviewed Age distribution of children under age 5 by 3 month groups Age and period ratios at

  16. i

    Data and analysis of the avatar surveys

    • ieee-dataport.org
    Updated Jul 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ines Miguel Alonso (2024). Data and analysis of the avatar surveys [Dataset]. https://ieee-dataport.org/documents/data-and-analysis-avatar-surveys
    Explore at:
    Dataset updated
    Jul 9, 2024
    Authors
    Ines Miguel Alonso
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data and analysis of the surveys to study the users' opinion about the presence of an avatar during a learning experience in Mixed Reality. Also there are demographic data and the open questions collected. This data was used in the paper Evaluating the Effectiveness of Avatar-Based Collaboration in XR for Pump Station Training Scenarios for the GeCon 2024 Conference.

  17. Quarterly Stocks Survey (QSS) and Quarterly Acquisitions and Disposals of...

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Quarterly Stocks Survey (QSS) and Quarterly Acquisitions and Disposals of Capital Assets Survey (QCAS) textual data analysis [Dataset]. https://www.ons.gov.uk/economy/grossdomesticproductgdp/datasets/quarterlystockssurveyqssandcapitalassetssurveyqcastextualdataanalysis
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 30, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Based on qualitative responses from businesses to our Quarterly Acquisitions and Disposals of Capital Assets Survey (QCAS) and Quarterly Stocks Survey (QSS).

  18. d

    Health and Retirement Study (HRS)

    • search.dataone.org
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D

  19. H

    Analyzed Data for The Impact of COVID-19 on Technical Services Units Survey...

    • dataverse.harvard.edu
    • search.dataone.org
    • +1more
    Updated Mar 21, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Szkirpan (2022). Analyzed Data for The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.7910/DVN/DGBUV7
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Elizabeth Szkirpan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These datasets contain cleaned data survey results from the October 2021-January 2022 survey titled "The Impact of COVID-19 on Technical Services Units". This data was gathered from a Qualtrics survey, which was anonymized to prevent Qualtrics from gathering identifiable information from respondents. These specific iterations of data reflect cleaning and standardization so that data can be analyzed using Python. Ultimately, the three files reflect the removal of survey begin/end times, other data auto-recorded by Qualtrics, blank rows, blank responses after question four (the first section of the survey), and non-United States responses. Note that State names for "What state is your library located in?" (Q36) were also standardized beginning in Impact_of_COVID_on_Tech_Services_Clean_3.csv to aid in data analysis. In this step, state abbreviations were spelled out and spelling errors were corrected.

  20. Arizona Youth Survey 2020

    • citydata.mesaaz.gov
    • data.mesaaz.gov
    application/rdfxml +5
    Updated Feb 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mesa Public Schools (2024). Arizona Youth Survey 2020 [Dataset]. https://citydata.mesaaz.gov/Education-Workforce/Arizona-Youth-Survey-2020/t4cm-5uf5
    Explore at:
    tsv, csv, application/rdfxml, json, xml, application/rssxmlAvailable download formats
    Dataset updated
    Feb 15, 2024
    Dataset provided by
    Mesa Unified Districthttp://www.mpsaz.org/
    Authors
    Mesa Public Schools
    Area covered
    Arizona
    Description

    Mesa Unified District Report aggregated for all schools - 2020 Arizona Youth Survey Results. Survey administered statewide every 2 years. See the Attachments in the About this Dataset section for an aggregated junior high school report as well as results by individual high school and junior high school. NOTE: link starts a pdf download of the report. To learn more about the Arizona Youth Survey visit https://www.azcjc.gov/Programs/Data-Integration-Analytics-Optimization/Statistical-Analysis-Center/Arizona-Youth-Survey

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Statista (2015). Data analytics tools in use by organizations in the United States 2015-2017 [Dataset]. https://www.statista.com/statistics/500119/united-states-survey-use-data-analytics-tools/
Organization logo

Data analytics tools in use by organizations in the United States 2015-2017

Explore at:
Dataset updated
Dec 1, 2015
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2015
Area covered
United States
Description

The statistic shows the analytics tools currently in use by business organizations in the United States, as well as the analytics tools respondents believe they will be using in two years, according to a 2015 survey conducted by the Harvard Business Review Analytics Service. As of 2015, ** percent of respondents believed they were going to use predictive analytics for data analysis in two years' time.

Search
Clear search
Close search
Google apps
Main menu