88 datasets found
  1. QADO: An RDF Representation of Question Answering Datasets and their...

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov (2023). QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.21750029.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Andreas Both; Oliver Schmidtke; Aleksandr Perevalov
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.

    Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.

    Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.

  2. Z

    ELDIAdata: Statistical Data

    • data.niaid.nih.gov
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELDIA Statistical Team (2022). ELDIAdata: Statistical Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6721461
    Explore at:
    Dataset updated
    Sep 20, 2022
    Dataset authored and provided by
    ELDIA Statistical Team
    Description

    This dataset contains the statistical data retrieved from a large-scale questionnaire survey conducted in eight countries from speakers of thirteen Finno-Ugric minority languages, representing three branches of the language family: Finnic (Finnish, Meänkieli, Kven, Karelian, Veps, Estonian, Võro, Seto), Sámi (North Sámi) and Ugric (Hungarian). The databank consists of two major parts: the minority language target group database and the control group database. The ELDIAdata minority language database contains in 3,388 individual records; the language-specific data sets include 340 variables each. The entire control group database contains in total 1,460 records from seven countries, each country-specific data set covers 280 variables. The survey data sets include only the results of the closed questions. For the full list of all statistical data files, the coding of variables and their values please refer to the descriptions under “ELDIAdata: Metadata”.

  3. e

    DOF Assembly Written Questions Performance Statistics

    • data.europa.eu
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenDataNI (2022). DOF Assembly Written Questions Performance Statistics [Dataset]. https://data.europa.eu/data/datasets/department-of-finance-performance-statistics-on-assembly-written-questions-sept-to-dec-2020?locale=en
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    OpenDataNI
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This dataset contains the Department of Finance Performance Statistics on Assembly Written Questions .

  4. A

    Replication Data for: Do Question Topic and Placement Shape Survey Breakoff...

    • data.aussda.at
    Updated Mar 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carole Wilson; Carole Wilson; Luke Plutowski; Luke Plutowski; Elizabeth J. Zechmeister; Elizabeth J. Zechmeister (2024). Replication Data for: Do Question Topic and Placement Shape Survey Breakoff Rates? (OA edition) [Dataset]. http://doi.org/10.11587/MMOPTD
    Explore at:
    tsv(324905), application/x-stata-syntax(1962), pdf(1420301), pdf(50147)Available download formats
    Dataset updated
    Mar 21, 2024
    Dataset provided by
    AUSSDA
    Authors
    Carole Wilson; Carole Wilson; Luke Plutowski; Luke Plutowski; Elizabeth J. Zechmeister; Elizabeth J. Zechmeister
    License

    https://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/MMOPTDhttps://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/MMOPTD

    Area covered
    Haiti
    Dataset funded by
    United States Agency for International Development
    Description

    Full edition for public use. These data come from a telephone survey of Haitian adults conducted April-June 2020. The study considers whether placing questions about a salient topic (COVID-19) decreases breakoff rates. The overall survey is concerned with democratic attitudes, but this dataset includes only those variables relevant to the paper in Survey Methods: Insights from the Field.

  5. o

    DoJ Performance Statistics Assembly Written Questions - Dataset - Open Data...

    • admin.opendatani.gov.uk
    Updated Mar 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). DoJ Performance Statistics Assembly Written Questions - Dataset - Open Data NI [Dataset]. https://admin.opendatani.gov.uk/dataset/doj-performance-statistics-assembly-written-questions
    Explore at:
    Dataset updated
    Mar 16, 2021
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This dataset contains the Department of Justice Performance Statistics on Assembly Written Questions

  6. w

    Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina...

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Jan 30, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Agency for Statistics (BHAS) (2020). Living Standards Measurement Survey 2003 (Wave 3 Panel) - Bosnia-Herzegovina [Dataset]. https://microdata.worldbank.org/index.php/catalog/67
    Explore at:
    Dataset updated
    Jan 30, 2020
    Dataset provided by
    Republika Srpska Institute of Statistics (RSIS)
    State Agency for Statistics (BHAS)
    Federation of BiH Institute of Statistics (FIS)
    Time period covered
    2003
    Area covered
    Bosnia and Herzegovina
    Description

    Abstract

    In 2001, the World Bank in co-operation with the Republika Srpska Institute of Statistics (RSIS), the Federal Institute of Statistics (FOS) and the Agency for Statistics of BiH (BHAS), carried out a Living Standards Measurement Survey (LSMS). The Living Standard Measurement Survey LSMS, in addition to collecting the information necessary to obtain a comprehensive as possible measure of the basic dimensions of household living standards, has three basic objectives, as follows:

    1. To provide the public sector, government, the business community, scientific institutions, international donor organizations and social organizations with information on different indicators of the population's living conditions, as well as on available resources for satisfying basic needs.

    2. To provide information for the evaluation of the results of different forms of government policy and programs developed with the aim to improve the population's living standard. The survey will enable the analysis of the relations between and among different aspects of living standards (housing, consumption, education, health, labor) at a given time, as well as within a household.

    3. To provide key contributions for development of government's Poverty Reduction Strategy Paper, based on analyzed data.

    The Department for International Development, UK (DFID) contributed funding to the LSMS and provided funding for a further two years of data collection for a panel survey, known as the Household Survey Panel Series (HSPS). Birks Sinclair & Associates Ltd. were responsible for the management of the HSPS with technical advice and support provided by the Institute for Social and Economic Research (ISER), University of Essex, UK. The panel survey provides longitudinal data through re-interviewing approximately half the LSMS respondents for two years following the LSMS, in the autumn of 2002 and 2003. The LSMS constitutes Wave 1 of the panel survey so there are three years of panel data available for analysis. For the purposes of this documentation we are using the following convention to describe the different rounds of the panel survey: - Wave 1 LSMS conducted in 2001 forms the baseline survey for the panel
    - Wave 2 Second interview of 50% of LSMS respondents in Autumn/ Winter 2002 - Wave 3 Third interview with sub-sample respondents in Autumn/ Winter 2003

    The panel data allows the analysis of key transitions and events over this period such as labour market or geographical mobility and observe the consequent outcomes for the well-being of individuals and households in the survey. The panel data provides information on income and labour market dynamics within FBiH and RS. A key policy area is developing strategies for the reduction of poverty within FBiH and RS. The panel will provide information on the extent to which continuous poverty is experienced by different types of households and individuals over the three year period. And most importantly, the co-variates associated with moves into and out of poverty and the relative risks of poverty for different people can be assessed. As such, the panel aims to provide data, which will inform the policy debates within FBiH and RS at a time of social reform and rapid change.

    Geographic coverage

    National coverage. Domains: Urban/rural/mixed; Federation; Republic

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The Wave 3 sample consisted of 2878 households who had been interviewed at Wave 2 and a further 73 households who were interviewed at Wave 1 but were non-contact at Wave 2 were issued. A total of 2951 households (1301 in the RS and 1650 in FBiH) were issued for Wave 3. As at Wave 2, the sample could not be replaced with any other households.

    Panel design

    Eligibility for inclusion

    The household and household membership definitions are the same standard definitions as a Wave 2. While the sample membership status and eligibility for interview are as follows: i) All members of households interviewed at Wave 2 have been designated as original sample members (OSMs). OSMs include children within households even if they are too young for interview. ii) Any new members joining a household containing at least one OSM, are eligible for inclusion and are designated as new sample members (NSMs). iii) At each wave, all OSMs and NSMs are eligible for inclusion, apart from those who move outof-scope (see discussion below). iv) All household members aged 15 or over are eligible for interview, including OSMs and NSMs.

    Following rules

    The panel design means that sample members who move from their previous wave address must be traced and followed to their new address for interview. In some cases the whole household will move together but in others an individual member may move away from their previous wave household and form a new split-off household of their own. All sample members, OSMs and NSMs, are followed at each wave and an interview attempted. This method has the benefit of maintaining the maximum number of respondents within the panel and being relatively straightforward to implement in the field.

    Definition of 'out-of-scope'

    It is important to maintain movers within the sample to maintain sample sizes and reduce attrition and also for substantive research on patterns of geographical mobility and migration. The rules for determining when a respondent is 'out-of-scope' are as follows:

    i. Movers out of the country altogether i.e. outside FBiH and RS. This category of mover is clear. Sample members moving to another country outside FBiH and RS will be out-of-scope for that year of the survey and not eligible for interview.

    ii. Movers between entities Respondents moving between entities are followed for interview. The personal details of the respondent are passed between the statistical institutes and a new interviewer assigned in that entity.

    iii. Movers into institutions Although institutional addresses were not included in the original LSMS sample, Wave 3 individuals who have subsequently moved into some institutions are followed. The definitions for which institutions are included are found in the Supervisor Instructions.

    iv. Movers into the district of Brcko are followed for interview. When coding entity Brcko is treated as the entity from which the household who moved into Brcko originated.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Questionnaire design

    Approximately 90% of the questionnaire (Annex B) is based on the Wave 2 questionnaire, carrying forward core measures that are needed to measure change over time. The questionnaire was widely circulated and changes were made as a result of comments received.

    Pretesting

    In order to undertake a longitudinal test the Wave 2 pretest sample was used. The Control Forms and Advance letters were generated from an Access database containing details of ten households in Sarajevo and fourteen in Banja Luka. The pretest was undertaken from March 24-April 4 and resulted in 24 households (51 individuals) successfully interviewed. One mover household was successfully traced and interviewed.
    In order to test the questionnaire under the hardest circumstances a briefing was not held. A list of the main questionnaire changes was given to experienced interviewers.

    Issues arising from the pretest

    Interviewers were asked to complete a Debriefing and Rating form. The debriefing form captured opinions on the following three issues:

    1. General reaction to being re-interviewed. In some cases there was a wariness of being asked to participate again, some individuals asking “Why Me?” Interviewers did a good job of persuading people to take part, only one household refused and another asked to be removed from the sample next year. Having the same interviewer return to the same households was considered an advantage. Most respondents asked what was the benefit to them of taking part in the survey. This aspect was reemphasised in the Advance Letter, Respondent Report and training of the Wave 3 interviewers.

    2. Length of the questionnaire. The average time of interview was 30 minutes. No problems were mentioned in relation to the timing, though interviewers noted that some respondents, particularly the elderly, tended to wonder off the point and that control was needed to bring them back to the questions in the questionnaire. One interviewer noted that the economic situation of many respondents seems to have got worse from the previous year and it was necessary to listen to respondents “stories” during the interview.

    3. Confidentiality. No problems were mentioned in relation to confidentiality. Though interviewers mentioned it might be worth mentioning the new Statistics Law in the Advance letter. The Rating Form asked for details of specific questions that were unclear. These are described below with a description of the changes made.

    • Module 3. Q29-31 have been added to capture funds received for education, scholarships etc.

    • Module 4. Pretest respondents complained that the 6 questions on "Has your health limited you..." and the 16 on "in the last 7 days have you felt depressed” etc were too many. These were reduced by half (Q38-Q48). The LSMS data was examined and those questions where variability between the answers was widest were chosen.

    • Module 5. The new employment questions (Q42-Q44) worked well and have been kept in the main questionnaire.

    • Module 7. There were no problems reported with adding the credit questions (Q28-Q36)

    • Module 9. SIG recommended that some of Questions 1-12 were relevant only to those aged over 18 so additional skips have been added. Some respondents complained the questionnaire was boring. To try and overcome

  7. g

    Session wise Statistical Information relating to Questions in Rajya Sabha |...

    • gimi9.com
    Updated May 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Session wise Statistical Information relating to Questions in Rajya Sabha | gimi9.com [Dataset]. https://gimi9.com/dataset/in_session-wise-statistical-information-relating-questions-rajya-sabha/
    Explore at:
    Dataset updated
    May 9, 2025
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Get statistical information relating to notices of questions received, processed and replied by ministry / departments in Rajya Sabha. It contains various kind of information which have been compiled from statistics relating to Questions dealt with during the Session.

  8. Impact of a quiz in video data files

    • figshare.com
    bin
    Updated May 29, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Rice (2018). Impact of a quiz in video data files [Dataset]. http://doi.org/10.6084/m9.figshare.6383837.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 29, 2018
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Paul Rice
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Two SPSS datasets evaluating the impact of a quiz in an educational video. Students were exposed to three variations of video and subsequent MCQ scores are captured

  9. B

    Beyond searching to teaching interpretation: A road map for librarians to...

    • borealisdata.ca
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanna Badia (2025). Beyond searching to teaching interpretation: A road map for librarians to teach statistical literacy [Dataset]. http://doi.org/10.5683/SP3/4UL1U0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Borealis
    Authors
    Giovanna Badia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    Descriptive and inferential statistics are taught to students in many disciplines. More classroom time is often spent on the theory behind different statistical methods that investigate relationships between variables rather than on how to interpret the results obtained to answer the research question that started the process. While statistical software (such as R, Stata, and SPSS) has made it easier to undertake regression with any dataset, the output produced remains challenging to understand and explain to intended audiences. To address this issue, the author created a 90-minute workshop that teaches students how to read tables of descriptive statistics and linear regression results produced by statistical software. The workshop has been taught each semester at the author’s institution since its creation in the Fall 2022 term, attracting a predominantly graduate student audience. Feedback has been positive thus far, with student requests for additional workshops on reading the results of different statistical models, such as logistic and count regression. Through an explanation of the process and the resources used, this presentation will provide a practical overview of how librarians can teach others how to read descriptive statistics and regression results using a research question and their own experiences working with data to guide them. It will include steps to prepare for designing a statistical literacy workshop. The aim of this presentation is to provide ideas that will help librarians move towards teaching a statistical literacy workshop at their own institutions or help them expand their teaching activities in this area.

  10. Data Science Interview Questions

    • kaggle.com
    Updated Aug 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iron486 (2022). Data Science Interview Questions [Dataset]. https://www.kaggle.com/datasets/die9origephit/data-science-interview-questions/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Iron486
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F6372737%2F4eb9aea3a5d077e75fae1b3d0d292dd9%2FMetroMap_Data_Analyst.png?generation=1660517959830249&alt=media">

    Content

    This is a collection of questions useful for people who want to test their data science knowledge for interviews or for refreshing some specific topics.
    Most of the questions are related to data science, data analysis, machine learning, deep learning, probability, statistics and programming. Majority of them, include answers too.

    Acknowledgements

    The questions were fetched from various sources. After being collected, some typos were corrected, and the style and the format of the questions were modified, making the pdfs more readable. Here are the sources: https://www.nicksingh.com/posts/40-probability-statistics-data-science-interview-questions-asked-by-fang-wall-street https://github.com/kojino/120-Data-Science-Interview-Questions https://intellipaat.com/blog/interview-question/data-science-interview-questions/ https://www.projectpro.io/article/100-deep-learning-interview-questions-and-answers-for-2021/419

  11. Participation Survey 2023–24 annual publication

    • gov.uk
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Culture, Media and Sport (2025). Participation Survey 2023–24 annual publication [Dataset]. https://www.gov.uk/government/statistics/participation-survey-2023-24-annual-publication
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Culture, Media and Sport
    Description

    The Participation Survey started in October 2021 and is the key evidence source on engagement for DCMS. It is a continuous push-to-web household survey of adults aged 16 and over in England.

    The Participation Survey provides nationally representative estimates of physical and digital engagement with the arts, heritage, museums & galleries, and libraries, as well as engagement with tourism, major events, live sports and digital.

    In 2023/24, DCMS partnered with Arts Council England (ACE) to boost the Participation Survey to be able to produce meaningful estimates at Local Authority level. This has enabled us to have the most granular data we have ever had, which means there were some new questions and changes to existing questions, response options and definitions in the 23/24 survey. The questionnaire for 2023/24 has been developed collaboratively to adapt to the needs and interests of both DCMS and ACE.

    • Released: 24 July 2024.
    • Period covered: May 2023 to March 2024.
    • Geographic coverage: National , regional and local authority level data for England.
    • Next release date: September 2024.

    The Participation Survey is only asked of adults in England. Currently there is no harmonised survey or set of questions within the administrations of the UK. Data on participation in cultural sectors for the devolved administrations is available in the https://www.gov.scot/collections/scottish-household-survey/" class="govuk-link">Scottish Household Survey, https://gov.wales/national-survey-wales" class="govuk-link">National Survey for Wales and https://www.communities-ni.gov.uk/topics/statistics-and-research/culture-and-heritage-statistics" class="govuk-link">Northern Ireland Continuous Household Survey.

    The pre-release access document above contains a list of ministers and officials who have received privileged early access to this release of Participation Survey data. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours. Details on the pre-release access arrangements for this dataset are available in the accompanying material.

    Our statistical practice is regulated by the OSR. OSR sets the standards of trustworthiness, quality and value in the https://code.statisticsauthority.gov.uk/the-code/" class="govuk-link">Code of Practice for Statistics that all producers of official statistics should adhere to.

    You are welcome to contact us directly with any comments about how we meet these standards by emailing evidence@dcms.gov.uk. Alternatively, you can contact OSR by emailing regulation@statistics.gov.uk or via the OSR website.

    Patterns were identified in Census 2021 data that suggest that some respondents may not have interpreted the gender identity question as intended, notably those with lower levels of English language proficiency. https://www.scotlandscensus.gov.uk/2022-results/scotland-s-census-2022-sexual-orientation-and-trans-status-or-history/" class="govuk-link">Analysis of Scotland’s census, where the gender identity question was different, has added weight to this observation. Similar respondent error may have occurred during the data collection for these statistics so comparisons between subnational and other smaller group breakdowns should be considered with caution. More information can be found in the ONS https://www.ons.gov.uk/peoplepopulationandcommunity/culturalidentity/sexuality/methodologies/sexualorientationandgenderidentityqualityinformationforcensus2021" class="govuk-link">sexual orientation and gender identity quality information report, and in the National Statistical https://blog.ons.gov.uk/2024/09/12/better-understanding-the-strengths-and-limitations-of-gender-identity-statistics/" class="govuk-link">blog about the strengths and limitations of gender identity statistics.

    The responsible statisticians for this release is Donilia Asgill and Ella Bentin. For enquiries on this release, contact participationsurvey@dcms.gov.uk.

  12. d

    Statistical analysis of past examination questions for skilled personnel and...

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ministry of Examination (2025). Statistical analysis of past examination questions for skilled personnel and the handling of doubts. [Dataset]. https://data.gov.tw/en/datasets/162816
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Ministry of Examination
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Handling situation of doubts about past test questions.

  13. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  14. Ten quick tips for getting the most scientific value out of numerical data

    • plos.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lars Ole Schwen; Sabrina Rueschenbaum (2023). Ten quick tips for getting the most scientific value out of numerical data [Dataset]. http://doi.org/10.1371/journal.pcbi.1006141
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Lars Ole Schwen; Sabrina Rueschenbaum
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most studies in the life sciences and other disciplines involve generating and analyzing numerical data of some type as the foundation for scientific findings. Working with numerical data involves multiple challenges. These include reproducible data acquisition, appropriate data storage, computationally correct data analysis, appropriate reporting and presentation of the results, and suitable data interpretation.Finding and correcting mistakes when analyzing and interpreting data can be frustrating and time-consuming. Presenting or publishing incorrect results is embarrassing but not uncommon. Particular sources of errors are inappropriate use of statistical methods and incorrect interpretation of data by software. To detect mistakes as early as possible, one should frequently check intermediate and final results for plausibility. Clearly documenting how quantities and results were obtained facilitates correcting mistakes. Properly understanding data is indispensable for reaching well-founded conclusions from experimental results. Units are needed to make sense of numbers, and uncertainty should be estimated to know how meaningful results are. Descriptive statistics and significance testing are useful tools for interpreting numerical results if applied correctly. However, blindly trusting in computed numbers can also be misleading, so it is worth thinking about how data should be summarized quantitatively to properly answer the question at hand. Finally, a suitable form of presentation is needed so that the data can properly support the interpretation and findings. By additionally sharing the relevant data, others can access, understand, and ultimately make use of the results.These quick tips are intended to provide guidelines for correctly interpreting, efficiently analyzing, and presenting numerical data in a useful way.

  15. CourseKata Dataset Items (QuestionTypes)

    • kaggle.com
    Updated Apr 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gagan Karnati (2024). CourseKata Dataset Items (QuestionTypes) [Dataset]. https://www.kaggle.com/datasets/gagankarnati/coursekata-dataset-items-questiontypes
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gagan Karnati
    Description

    CourseKata is a platform that creates and publishes a series of e-books for introductory statistics and data science classes that utilize demonstrated learning strategies to help students learn statistics and data science. The developers of CourseKata, Jim Stigler (UCLA) and Ji Son (Cal State Los Angeles) and their team, are cognitive psychologists interested in improving statistics learning by examining students' interactions with online interactive textbooks. Traditionally, much of the research in how students learn is done in a 1-hour lab or through small-scale interviews with students. CourseKata offers the opportunity to peek into the actions, responses, and choices of thousands of students as they are engaged in learning the interrelated concepts and skills of statistics and coding in R over many weeks or months in real classes.

    1. items.csv (1335 X 19) Each row contains information about a particular question (although it does not provide the prompt). The item to which a question belongs is included. All items/questions are represented. Use this file to go deeper into particular questions that students encounter in the course.

    Questions are grouped into items (item_id). An item can be one of three item_type 's: code, learnosity or learnosity-activity (the distinction between learnosity and learnosity-activity is not important). Code items are a single question and ask for R code as a response. (Responses can be seen in responses.csv.) Learnosity-activities and learnosity items are collections of one or more questions that can be of a variety of lrn_type's: ● association ● choicematrix ● clozeassociation ● formulaV2 ● imageclozeassociation ● mcq ● plaintext ● shorttext ● sortlist

    Examples of these question types are provided at the end of this document.

    The level of detail made available to you in the responses file depends on the lrn_type. For example, for multiple choice questions (mcq), you can find the options in the responses file in the columns labeled lrn_option_0 through lrn_option_11, and you can see the chosen option in the results variable.

    Assessment Types In general, assessments, such as the items and questions included in CourseKata, can be used for two purposes. Formative assessments are meant to provide feedback to the student (and instructor), or to serve as a learning aid to help prompt students improve memory and deepen their understanding. Summative assessments are meant to provide a summary of a student's understanding, often for use in assigning a grade. For example, most midterms and final exams that you've taken are summative assessments.

    The vast majority of items in CourseKata should be treated as formative assessments. The exceptions are the end-of-chapter Review questions, which can be thought of as summative. The mean number of correct answers for end-of-chapter review questions is provided within the checkpoints file. You might see that some pages have the word "Quiz" or "Exam" or "Midterm" in them. Results from these items and responses to them are not provided to us in this data set.

  16. H

    Replication Data for: The Statistical Analysis of Misreporting on Sensitive...

    • dataverse.harvard.edu
    application/warc, txt +2
    Updated Dec 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2016). Replication Data for: The Statistical Analysis of Misreporting on Sensitive Survey Questions [Dataset]. http://doi.org/10.7910/DVN/PZKBUX
    Explore at:
    application/warc(2872453), type/x-r-syntax(57167), type/x-r-syntax(41168), type/x-r-syntax(3903), application/warc(2913220), zip(267756), type/x-r-syntax(13433), type/x-r-syntax(13432), type/x-r-syntax(3899), zip(63524414), txt(3862), type/x-r-syntax(8182)Available download formats
    Dataset updated
    Dec 20, 2016
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Replication data for the article Eady, Gregory (2016) "The Statistical Analysis of Misreporting on Sensitive Survey Questions"

  17. J

    ROUNDING, FOCAL POINT ANSWERS AND NONRESPONSE TO SUBJECTIVE PROBABILITY...

    • journaldata.zbw.eu
    • jda-test.zbw.eu
    pdf, txt
    Updated Dec 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristin J. Kleinjans; Arthur van Soest; Kristin J. Kleinjans; Arthur van Soest (2022). ROUNDING, FOCAL POINT ANSWERS AND NONRESPONSE TO SUBJECTIVE PROBABILITY QUESTIONS (replication data) [Dataset]. http://doi.org/10.15456/jae.2022321.0714426723
    Explore at:
    txt(2913), pdf(31909)Available download formats
    Dataset updated
    Dec 7, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Kristin J. Kleinjans; Arthur van Soest; Kristin J. Kleinjans; Arthur van Soest
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We develop a panel data model explaining answers to subjective probabilities about binary events and estimate it using data from the Health and Retirement Study on six such probabilities. The model explicitly accounts for several forms of reporting behavior: rounding, focal point 50% answers and item nonresponse. We find observed and unobserved heterogeneity in the tendencies to report rounded values or a focal answer, explaining persistency in 50% answers over time. Focal 50% answers matter for some of the probabilities. Incorporating reporting behavior does not have a large effect on the estimated distribution of the genuine subjective probabilities.

  18. f

    Questions assessing respondents' perceptions and behaviour relating to...

    • figshare.com
    xls
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kristina Blennow; Johannes Persson; Margarida Tomé; Marc Hanewinkel (2023). Questions assessing respondents' perceptions and behaviour relating to climate change, and socio-demographic variables; possible responses to the questions; and percentage responses of respondents (or other summary statistics, where noted) who answered yes and no to the question Have you adapted your forest management in response to climate change? (n = 828). [Dataset]. http://doi.org/10.1371/journal.pone.0050182.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Kristina Blennow; Johannes Persson; Margarida Tomé; Marc Hanewinkel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    n =  Numbers of responses. Test statistics for Wilcoxon rank sum test (W), Student's t-test (t), and χ2-test (χ2). Mean, median and ranges calculated from raw data before imputation.

  19. f

    Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the...

    • plos.figshare.com
    xls
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mete Kara; Erkan Ozduran; Müge Mercan Kara; İlhan Celil Özbek; Volkan Hancı (2025). Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the most frequently asked Ankylosing spondylitis -related questions, and a statistical comparison of the text content to a 6th-grade reading level [Median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)]. [Dataset]. http://doi.org/10.1371/journal.pone.0326351.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Mete Kara; Erkan Ozduran; Müge Mercan Kara; İlhan Celil Özbek; Volkan Hancı
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Readability scores for Chatgpt-4o, Gemini, and Perplexity responses to the most frequently asked Ankylosing spondylitis -related questions, and a statistical comparison of the text content to a 6th-grade reading level [Median, 95% Confidence Interval (CI) (Lower limit of confidence interval- Upper limit of confidence interval)].

  20. Analysis of the experience, interests, and expectations of first-year...

    • figshare.com
    • portalcientificovalencia.univeuropea.com
    txt
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Yeste (2024). Analysis of the experience, interests, and expectations of first-year students of the UEV STEAM degrees [Dataset]. http://doi.org/10.6084/m9.figshare.25161746.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Víctor Yeste
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Víctor Yeste. Universitat Politècnica de València. Universidad Europea de Valencia.The main objective is to analyze, using descriptive statistics, the experience, interests, and expectations of the programming languages by first-year students of the STEAM degrees of the European University of Valencia.Google Forms was chosen to evaluate students' views on programming languages and computational thinking through a question. It is a free tool that has been used in many studies, such as Haddad and Kalaani (2014), to capture the opinion of students beyond course assessment surveys, as it is straightforward, systematic, and easy to implement. It can be used through a web-based application to create online questionnaires with a friendly interface. All answers are collected using a Google Spreadsheet document stored on Google Drive. In addition, it enables the results of the questionnaire to be visualized through a statistical summary of each question and its answers.The questionnaire consisted of 19 questions, although some were subject to a specific answer to a previous question. To carry out the form, the first day of class of the subject of Fundamentals of Programming or Scientific Computing I has been chosen (depending on the degree, has a different name, even the same), specifically in the classes of 19 and 20 September 2023. It is a subject that is given in the first semester of the first year of all STEAM degrees of the European University of Valencia, which include Data Science, Physics, Engineering in Industrial Organization, and a Double Engineering Degree in Engineering in Industrial Organization and Business Administration and Management. In this subject, computational thinking is developed thanks to the study of theory and a significant practical component of programming in C++, one of today's most influential and essential programming languages (Cyganek, 2022).The questionnaire was proposed to first-year 2023-2024 students, encouraging them to participate in the first class they had on the subject and through a direct link to the questionnaire on the virtual campus, based on Canvas.This dataset has contributed to the elaboration of the book chapter:Yeste, Víctor (2024). ¿Los alumnos de STEAM saben programar al comenzar la universidad? Análisis de su experiencia, intereses y expectativas. In Perspectivas Contemporáneas en Educación: Innovación, Investigación y Transformación, Dykinson S.L.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Andreas Both; Oliver Schmidtke; Aleksandr Perevalov (2023). QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility [Dataset]. http://doi.org/10.6084/m9.figshare.21750029.v3
Organization logo

QADO: An RDF Representation of Question Answering Datasets and their Analyses for Improving Reproducibility

Explore at:
zipAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Andreas Both; Oliver Schmidtke; Aleksandr Perevalov
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.

Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.

Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.

Search
Clear search
Close search
Google apps
Main menu