100+ datasets found
  1. Secondary Data from Insights from Publishing Open Data in Industry-Academia...

    • zenodo.org
    bin, json +2
    Updated Sep 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola (2024). Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration [Dataset]. http://doi.org/10.5281/zenodo.13767153
    Explore at:
    json, text/x-python, bin, txtAvailable download formats
    Dataset updated
    Sep 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration

    Authors

    Per Erik Strandberg [1], Philipp Peterseil [2], Julian Karoliny [3], Johanna Kallio [4], and Johannes Peltola [4].

    [1] Westermo Network Technologies AB (Sweden).
    [2] Johannes Kepler University Linz (Austria)
    [3] Silicon Austria Labs GmbH (Austria).
    [4] VTT Technical Research Centre of Finland Ltd. (Finland).

    Description

    This data is to accompany a paper submitted to Elsevier's data in brief in 2024, with the title Insights from Publishing Open Data in Industry-Academia Collaboration.

    Tentative Abstract: Effective data management and sharing are critical success factors in industry-academia collaboration. This paper explores the motivations and lessons learned from publishing open data sets in such collaborations. Through a survey of participants in a European research project that published 13 data sets, and an analysis of metadata from almost 281 thousand datasets in Zenodo, we collected qualitative and quantitative results on motivations, achievements, research questions, licences and file types. Through inductive reasoning and statistical analysis we found that planning the data collection is essential, and that only few datasets (2.4%) had accompanying scripts for improved reuse. We also found that authors are not well aware of the importance of licences or which licence to choose. Finally, we found that data with a synthetic origin, collected with simulations and potentially mixed with real measurements, can be very meaningful, as predicted by Gartner and illustrated by many datasets collected in our research project.

    Secondary data from Survey

    The file survey.txt contains secondary data from a survey of participants that published open data sets in the 3-year European research project InSecTT.

    Secondary data from Zenodo

    The file secondary_data_zenodo.json contains secondary data from an analysis of data sets published in Zenodo. It is accompanied with a py-file and a ipynb-file to serve as examples.

    License

    This data is licenced with the Creative Commons Attribution 4.0 International license. You are free to use the data if you attribute the authors. Read the license text for details.

  2. Household Income and Expenditure Survey - 2005 - Sri Lanka

    • nada.statistics.gov.lk
    • catalog.ihsn.org
    Updated Jan 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Census and Statistics (2023). Household Income and Expenditure Survey - 2005 - Sri Lanka [Dataset]. https://nada.statistics.gov.lk/index.php/catalog/34
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    Department of Census and Statistics
    Time period covered
    2005
    Area covered
    Sri Lanka
    Description

    Abstract

    This survey provides information on household income and expenditure leading to measure the levels and changes of the living conditions of the people and to observe the consumption patterns .

    Key objectives of the survey - To identify the income patterns in Urban, Rural and Estate Sectors & provinces. - To identify the income patterns by income levels. - Average consumption of food items and non food items - Expenditure patterns by sector and by income level.

    Geographic coverage

    National coverage.

    Analysis unit

    Household, Individuals

    Universe

    For this survey a sample of buildings and the occupants therein was drawn from the whole island

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A two stage stratified random sample design was used in the survey. Urban, Rural and Estate sectors of the Districts were the domains for stratification. The sample frame was the list of buildings that were prepared for the Census of Population and Housing 2001.

    Selection of Primary Sampling Units (PSU's) Primary sampling units are the census blocks prepared for the Census of Population and Housing - 2001. The sample frame, which is a collection of all census blocks in the domain, was used for the selection of primary sampling units. A sample of 500 primary sampling units was selected from the sampling frame for the survey.

    Selection of Secondary Sampling Units (SSU's) Secondary Sampling Units are the housing units in the selected 500 primary sampling units (census blocks). From each primary sampling unit 10 housing units (SSU) were selected for the survey. The total sample size of 5000 housing units was selected and distributed among Districts in Sri Lanka.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    Questionaires

           The survey schedule was designed to collect data by household and separate schedules were used for each household identified according to the definition of the household within the housing units selected for the survey. The survey schedule consists three main sections . 
    
           1. Demographic section 
           2. Expenditure
           3. Income
    

    The Demographic characteristics and usual activities of the inmates belonging to the household were reported in the Demographic section of the schedule (and close relatives temporarily living away are also listed in this section). Expenditure section has two sub sections to report food and non-food consumption data separately. Expenditure incurred on their own decisions by boarders and servants are recorded in the sub section under the Main expenditure section. The income has seven sub sections categorized according to the main sources of income.

    Sampling error estimates

    The exact differences or sampling error ,varies depending on the particular sample selected and the variability is measured by the standard error of the estimate. There is about a 95% chance or level of confidence that an estimate based on a sample will differ by no more than 1.96 standard errors from the true population value because of sampling error. Analyses relating to the HIES are generally conducted at the 95% level of confidence .

              confidence interval =  Estimate value ± (standard error )*(1.96)
    

    Data appraisal

    http://www.statistics.gov.lk/HIES/HIES%202007/introduction%20%20HIES.pdf

    By visiting the above website a description about the adjustments for non-response could be read in section 1.2 of the Final report.

  3. i

    DHS EdData Survey 2010 - Nigeria

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    Updated Mar 29, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Population Commission (2019). DHS EdData Survey 2010 - Nigeria [Dataset]. https://catalog.ihsn.org/index.php/catalog/3344
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset authored and provided by
    National Population Commission
    Time period covered
    2009 - 2010
    Area covered
    Nigeria
    Description

    Abstract

    The 2010 NEDS is similar to the 2004 Nigeria DHS EdData Survey (NDES) in that it was designed to provide information on education for children age 4–16, focusing on factors influencing household decisions about children’s schooling. The survey gathers information on adult educational attainment, children’s characteristics and rates of school attendance, absenteeism among primary school pupils and secondary school students, household expenditures on schooling and other contributions to schooling, and parents’/guardians’ perceptions of schooling, among other topics.The 2010 NEDS was linked to the 2008 Nigeria Demographic and Health Survey (NDHS) in order to collect additional education data on a subset of the households (those with children age 2–14) surveyed in the 2008 Nigeria DHS survey. The 2008 NDHS, for which data collection was carried out from June to October 2008, was the fourth DHS conducted in Nigeria (previous surveys were implemented in 1990, 1999, and 2003).

    The goal of the 2010 NEDS was to follow up with a subset of approximately 30,000 households from the 2008 NDHS survey. However, the 2008 NDHS sample shows that of the 34,070 households interviewed, only 20,823 had eligible children age 2–14. To make statistically significant observations at the State level, 1,700 children per State and the Federal Capital Territory (FCT) were needed. It was estimated that an additional 7,300 households would be required to meet the total number of eligible children needed. To bring the sample size up to the required target, additional households were screened and added to the overall sample. However, these households did not have the NDHS questionnaire administered. Thus, the two surveys were statistically linked to create some data used to produce the results presented in this report, but for some households, data were imputed or not included.

    Geographic coverage

    National

    Analysis unit

    Households Individuals

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The eligible households for the 2010 NEDS are the same as those households in the 2008 NDHS sample for which interviews were completed and in which there is at least one child age 2-14, inclusive. In the 2008 NDHS, 34,070 households were successfully interviewed, and the goal here was to perform a follow-up NEDS on a subset of approximately 30,000 households. However, records from the 2008 NDHS sample showed that only 20,823 had children age 4-16. Therefore, to bring the sample size up to the required number of children, additional households were screened from the NDHS clusters.

    The first step was to use the NDHS data to determine eligibility based on the presence of a child age 2-14. Second, based on a series of precision and power calculations, RTI determined that the final sample size should yield approximately 790 households per State to allow statistical significance for reporting at the State level, resulting in a total completed sample size of 790 × 37 = 29,230. This calculation was driven by desired estimates of precision, analytic goals, and available resources. To achieve the target number of households with completed interviews, we increased the final number of desired interviews to accommodate expected attrition factors such as unlocatable addresses, eligibility issues, and non-response or refusal. Third, to reach the target sample size, we selected additional samples from households that had been listed by NDHS but had not been sampled and visited for interviews. The final number of households with completed interviews was 26,934 slightly lower than the original target, but sufficient to yield interview data for 71,567 children, well above the targeted number of 1,700 children per State.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The four questionnaires used in the 2004 Nigeria DHS EdData Survey (NDES)— 1. Household Questionnaire 2. Parent/Guardian Questionnaire 3. Eligible Child Questionnaire 4. Independent Child Questionnaire—formed the basis for the 2010 NEDS questionnaires. These are all available in Appendix D of the survey report available under External Resources.

    More than 90 percent of the questionnaires remained the same; for cases where there was a clear justification or a need for a change in item formulation or a specific requirement for additional items, these were updated accordingly. A one day workshop was convened with the NEDS Implementation Team and the NDES Advisory Committee to review the instruments and identify any needed revisions, additions, or deletions. Efforts were made to collect data to ease integration of the 2010 NEDS data into the FMOE’s national education management information system. Instrument issues that were identified as being problematic in the 2004 NDES as well as items identified as potentially confusing or difficult were proposed for revision. Issues that USAID, DFID, FMOE, and other stakeholders identified as being essential but not included in the 2004 NDES questionnaires were proposed for incorporation into the 2010 NEDS instruments, with USAID serving as the final arbiter regarding questionnaire revisions and content.

    General revisions accepted into the questionnaires included the following: - A separation of all questions related to secondary education into junior secondary and senior secondary to reflect the UBE policy - Administration of school-based questions for children identified as attending pre-school - Inclusion of questions on disabilities of children and parents - Additional questions on Islamic schooling - Revision to the literacy question administration to assess English literacy for children attending school - Some additional questions on delivery of UBE under the financial questions section

    Upon completion of revisions to the English-language questionnaires, the instruments were translated and adapted by local translators into three languages—Hausa, Igbo, and Yoruba—and then back-translated into English to ensure accuracy of the translation. After the questionnaires were finalized, training materials used in the 2004 NDES and developed by Macro International, which included training guides, data collection manuals, and field observation materials, were reviewed. The materials were updated to reflect changes in the questionnaires. In addition, the procedures as described in the manuals and guides were carefully reviewed. Adjustments were made, where needed, based on experience on large-scale survey and lessons learned from the 2004 NDES and the 2008 NDHS, to ensure the highest quality data capture.

    Cleaning operations

    Data processing for the 2010 NEDS occurred concurrently with data collection. Completed questionnaires were retrieved by the field coordinators/trainers and delivered to NPC in standard envelops, labeled with the sample identification, team, and State name. The shipment also contained a written summary of any issues detected during the data collection process. The questionnaire administrators logged the receipt of the questionnaires, acknowledged the list of issues, and acted upon them if required. The editors performed an initial check on the questionnaires, performed any coding of open-ended questions (with possible assistance from the data entry operators), and left them available to be assigned to the data entry operators. The data entry operators entered the data into the system, with the support of the editors for erroneous or unclear data.

    Experienced data entry personnel were recruited from those who have performed data entry activities for NPC on previous studies. The data entry teams composed a data entry coordinator, supervisor and operators. Data entry coordinators oversaw the entire data entry process from programming and training to final data cleaning, made assignments, tracked progress, and ensured the quality and timeliness of the data entry process. Data entry supervisors were on hand at all times to ensure that proper procedures were followed and to help editors resolve any uncovered inconsistencies. The supervisors controlled incoming questionnaires, assigned batches of questionnaires to the data entry operators, and managed their progress. Approximately 30 clerks were recruited and trained as data entry operators to enter all completed questionnaires and to perform the secondary entry for data verification. Editors worked with the data entry operators to review information flagged as “erroneous” or “dubious” in the data entry process and provided follow up and resolution for those anomalies.

    The data entry program developed for the 2004 NDES was revised to reflect the revisions in the 2010 NEDS questionnaire. The electronic data entry and reporting system ensured internal consistency and inconsistency checks.

    Response rate

    A very high overall response rate of 97.9 percent was achieved with interviews completed in 26,934 households out of a total of 27,512 occupied households from the original sample of 28,624 households. The response rates did not vary significantly by urban–rural (98.5 percent versus 97.6 percent, respectively). The response rates for parent/guardians and children were even higher, and the rate for independent children was slightly lower than the overall sample rate, 97.4 percent. In all these cases, the urban/rural differences were negligible.

    Sampling error estimates

    Estimates derived from a sample survey are affected by two types of errors: (1) non-sampling errors and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as

  4. f

    Supplementary Material for: Knowledge and Attitudes about Privacy and...

    • karger.figshare.com
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziegler E.; Mladucky J.; Baty B.; Anderson R.; Botkin J. (2023). Supplementary Material for: Knowledge and Attitudes about Privacy and Secondary Data Use among African-Americans Using Direct-to-Consumer Genetic Testing [Dataset]. http://doi.org/10.6084/m9.figshare.21212912.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Karger Publishers
    Authors
    Ziegler E.; Mladucky J.; Baty B.; Anderson R.; Botkin J.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction: The rapidly expanding direct-to-consumer genetic testing (DTC GT) market is one area where narratives of underrepresented populations have not been explored extensively. This study describes African-American consumers’ personal experiences with and perceptions about DTC GT and explores similarities and differences between African-Americans and an earlier cohort of mostly European American consumers. Methods: Twenty semi-structured, qualitative interviews were held with individuals who self-identified as Black/African-American and completed DTC GT between February 2017 and February 2020. Interviews were transcribed and consensus-coded, using inductive content analysis. Results: Participants generally had positive regard for DTC GT. When considering secondary uses of their results or samples, most participants were aware this was a possibility but had little concrete knowledge about company practices. When prompted about potential uses, participants were generally comfortable with research uses but had mixed outlooks on other nonresearch uses such as law enforcement, cloning, and product development. Most participants expressed that consent should be required for any secondary use, with the option to opt out. The most common suggestion for companies was to improve transparency. Compared to European American participants, African-American participants expressed more trust in DTC GT companies compared to healthcare providers, more concerns about law enforcement uses of data, and a stronger expression of community considerations. Discussion/Conclusion: This study found that African-American consumers of DTC GT had a positive outlook about genetic testing and were open to research and some nonresearch uses, provided that they were able to give informed consent. Participants in this study had little knowledge of company practices regarding secondary uses. Compared to an earlier cohort of European American participants, African-American participants expressed more concerns about medical and law enforcement communities’ use of data and more reference to community engagement.

  5. Air change rate and SARS-CoV-2 exposure in hospitals and residences: A...

    • tandf.figshare.com
    • figshare.com
    xlsx
    Updated Feb 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuetong Zhang; Sripriya Nannu Shankar; William B. Vass; John A. Lednicky; Z. Hugh Fan; Duzgun Agdas; Robert Makuch; Chang-Yu Wu (2024). Air change rate and SARS-CoV-2 exposure in hospitals and residences: A meta-analysis [Dataset]. http://doi.org/10.6084/m9.figshare.25234859.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 23, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Yuetong Zhang; Sripriya Nannu Shankar; William B. Vass; John A. Lednicky; Z. Hugh Fan; Duzgun Agdas; Robert Makuch; Chang-Yu Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As COVID-19 swept across the globe, increased ventilation and implementation of air cleaning were emphasized by the US CDC and WHO as important strategies to reduce the risk of inhalation exposure to the virus. To assess whether higher ventilation and air cleaning rates lead to lower exposure risk to SARS-CoV-2, 1274 manuscripts published between April 2020 and September 2022 were screened using key words “airborne SARS-CoV-2 or “SARS-CoV-2 aerosol.” Ninety-three studies involved air sampling at locations with known sources (hospitals and residences) were selected and associated data were compiled. Two metrics were used to assess exposure risk: SARS-CoV-2 concentration and SARS-CoV-2 detection rate in air samples. Locations were categorized by type (hospital or residence) and proximity to the location housing the isolated/quarantined patient (primary or secondary). The results showed that hospital wards had lower airborne virus concentrations than residential isolation rooms. A negative correlation was found between airborne virus concentrations in primary-occupancy areas and air changes per hour (ACH). In hospital settings, sample positivity rates were significantly reduced in secondary-occupancy areas compared to primary-occupancy areas, but they were similar across sampling locations in residential settings. ACH and sample positivity rates were negatively correlated, though the effect was diminished when ACH values exceeded 8. While limitations associated with diverse sampling protocols exist, data considered by this meta-analysis support the notion that higher ACH may reduce exposure risks to the virus in ambient air. Copyright © 2024 American Association for Aerosol Research

  6. Assessing Punitive and Cooperative Strategies of Corporate Crime Control for...

    • icpsr.umich.edu
    • catalog.data.gov
    ascii, delimited, sas +2
    Updated Jan 27, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simpson, Sally S.; Garner, Joel; Gibbs, Carole (2011). Assessing Punitive and Cooperative Strategies of Corporate Crime Control for Select Companies Operating in 1995 Through 2000 [United States] [Dataset]. http://doi.org/10.3886/ICPSR22180.v1
    Explore at:
    delimited, ascii, spss, sas, stataAvailable download formats
    Dataset updated
    Jan 27, 2011
    Dataset provided by
    Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
    Authors
    Simpson, Sally S.; Garner, Joel; Gibbs, Carole
    License

    https://www.icpsr.umich.edu/web/ICPSR/studies/22180/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/22180/terms

    Time period covered
    1995 - 2000
    Area covered
    United States
    Description

    The purpose of the study was to evaluate the extent to which deterrence or cooperative strategies motivated firms and their facilities to comply with environmental regulations. The project collected administrative data (secondary data) for a sample of publicly owned, United States companies in the pulp and paper, steel, and oil refining industries from 1995 to 2000 to track each firm's economic, environmental, and enforcement compliance history. Company Economic and Size Data (Part 1) from 1993 to 2000 were gathered from the Standard and Poor's Industrial Compustat, Mergent Online, and Securities and Exchange Commission, resulting in 512 company/year observations. Next, the research team used the Directory of Corporate Affiliations, the Environmental Protection Agency's (EPA) Toxic Release Inventory (TRI), and the EPA's Permit Compliance System (PCS) to identify all facilities owned by the sample of firms between 1995 and 2000. Researchers then gathered Facility Ownership Data (Part 2), resulting in 15,408 facility/year observations. The research team gathered various types of PCS data from the EPA for facilities in the sample. Permit Compliance System Facility Data (Part 3) were gathered on the 214 unique major National Pollutant Discharge Elimination System (NPDES) permits issued to facilities in the sample. Although permits were given to facilities, facilities could have one or more discharge points (e.g., pipes) that released polluted water directly into surface waters. Thus, Permit Compliance System Discharge Points (Pipe Layout) Data (Part 4) were also collected on 1,995 pipes. The EPA determined compliance using two methods: inspections and evaluations/assessments. Permit Compliance System Inspections Data (Part 5) were collected on a total of 1,943 inspections. Permit Compliance System Compliance Schedule Data (Part 6) were collected on a total of 3,336 compliance schedule events. Permit Compliance System Compliance Schedule Violation Data (Part 7) were obtained for a total of 246 compliance schedule violations. Permit Compliance System Single Event Violations Data (Part 8) were collected on 75 single event violations. Permit Compliance System Measurement/Effluent and Reporting Violations Data (Part 9) were collected for 396,479 violations. Permit Compliance System Enforcement Actions Data (Part 10) were collected on 1,730 enforcement actions. Occupational Safety and Health Administration Data (Part 11) were collected on a total of 2,243 inspections. The OSHA data were collected by company name and include multiple facilities owned by each company and were not limited to facilities in the Permit Compliance System. Additional information about firm noncompliance was drawn from EPA Docket and CrimDoc systems. Administrative and Judicial Docket Case Data (Part 12) were collected on 40 administrative and civil cases. Administrative and Judicial Docket Case Settlement Data (Part 13) were collected on 36 administrative and civil cases. Criminal Case Data (Part 14) were collected on three criminal cases. For secondary data analysis purposes, the research team created the Yearly Final Report Data (Part 15) and the Quarterly Final Report Data (Part 16). The yearly data contain a total of 378 company/year observations; the quarterly data contain a total of 1,486 company/quarter observations. The research team also conducted a vignette survey of the same set of companies that are in the secondary data to measure compliance and managerial decision-making. Concerning the Vignette Data (Part 17), a factorial survey was developed and administered to company managers tapping into perceptions of the costs and benefits of pro-social and anti-social conduct for themselves and their companies. A total of 114 respondents from 2 of the sampled corporations read and responded to a total of 384 vignettes representing 4 scenario types: technical noncompliance, significant noncompliance, over-compliance, and response to counter-terrorism. Part 1 contains 19 economic and size variables. Part 2 contains a total of eight variables relating to ownership. Part 3 contains 67 variables with regard to facility characteristics. Part 4 contains 31 variables relating to discharge points and pipe layout information. Part 5 contains 13 inspections characteristics variables. Part 6 contains 13 compliance schedule event characteristics variabl

  7. b

    Secondary Data on Social Indicators and Public Expenditure on District and...

    • bonndata.uni-bonn.de
    • daten.zef.de
    png, xlsb, xml
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Simon; Michael Simon (2023). Secondary Data on Social Indicators and Public Expenditure on District and Regional Level in Tanzania (1996-2010) [Dataset]. http://doi.org/10.60507/FK2/4HPJDK
    Explore at:
    xml(33052), xlsb(615257), png(7086)Available download formats
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    bonndata
    Authors
    Michael Simon; Michael Simon
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1996 - Dec 31, 2010
    Area covered
    Tanzania
    Description

    Secondary data on social indicators and public expenditure on district and regional level in Tanzania (1996-2010), as for example: THINV: Logarithm of deflated public per capita spending on health in the short- and long term (total spending of the current and the last five budget years) SANI: Latrines per 100 pupils INFRA: Percentage of women and men age 15-49 who reported serious problems in accessing health care due to the distance to the next health facility URB: Percentage of people living in urban areas TAINV: Logarithm of deflated public per capita spending on agriculture (current and previous budget year)* BREASTF: Percentage who started breastfeeding within 1 hour of birth, among the last children born in the five years preceding the survey IODINE: Percentage of households with adequate iodine content of salt (15+ ppm) MEDU: Percentage of women age 15-49 who completed grade 6 at the secondary level VACC: Percentage of children age 12-23 months with a vaccination card TWINV: Logarithm of deflated public per capita spending on water in the short- and long term (total spending of the current and the last five budget years)* TEINV: Logarithm of deflated public per capita spending on education in the short- and long term (total spending of the current and the last five budget years)* LABOUR: Percentage of women and men employed in the 12 months preceding the survey LAND: Per capita farmland in ha (including the area under temporary mono/mixed crops, permanent mono/mixed crops and the area under pasture) RAIN: Yearly rainfall in mm etc. Purpose: The uploaded data were the basis for the following PhD-thesis: The optimal allocation of scarce resources for health improvement is a crucial factor to lower the burden of disease and to strengthen the productive capacities of people living in developing countries. This research project aims to devise tools in narrowing the gap between the actual allocation and a more efficient allocation of resources for health in the case of Tanzania. Firstly, the returns from alternative government spending across sectors such as agriculture, water etc. are analysed. Maximisation of the amount of Disability Adjusted Life Years (DALYs) averted per dollar invested is used as criteria. A Simultaneous Equation Model (SEM) is developed to estimate the required elasticities. The results of the quantitative analysis show that the highest returns on DALYs are obtained by investments in improved nutrition and access to safe water sources, followed by spending on sanitation. Secondly, focusing on the health sector itself, scarce resources for health improvement create the incentive to prioritise certain health interventions. Using the example of malaria, the objective of the second stage is to evaluate whether interventions are prioritized in such a way that the marginal dollar goes to where it has the highest effect on averting DALYs. PopMod, a longitudinal population model, is used to estimate the cost-effectiveness of six isolated and combined malaria intervention approaches. The results of the longitudinal population model show that preventive interventions such as insecticide–treated bed nets (ITNs) and intermittent presumptive treatment with Sulphadoxine-Pyrimethamine (SP) during pregnancy had the highest health returns (both US$ 41 per DALY averted). The third part of this dissertation focuses on the political economy aspect of the allocation of scarce resources for health improvement. The objective here is to positively assess how political party competition and the access to mass media directly affect the distribution of district resources for health improvement. Estimates of cross-sectional and panel data regression analysis imply that a one-percentage point smaller difference (the higher the competition is) between the winning party and the second-place party leads to a 0.151 percentage point increase in public health spending, which is significant at the five percent level. In conclusion, we can say that cross-sectoral effects, the cost-effectiveness of health interventions and the political environment are important factors at play in the country’s resource allocation decisions. In absolute terms, current financial resources to lower the burden of disease in Tanzania are substantial. However, there is a huge potential in optimizing the allocation of these resources for a better health return.

  8. Data from: Statewide Study of Stalking and Its Criminal Justice Response in...

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Statewide Study of Stalking and Its Criminal Justice Response in Rhode Island, 2001-2005 [Dataset]. https://catalog.data.gov/dataset/statewide-study-of-stalking-and-its-criminal-justice-response-in-rhode-island-2001-2005-b80b4
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    National Institute of Justicehttp://nij.ojp.gov/
    Description

    The research team collected data from statewide datasets on 268 stalking cases including a population of 108 police identified stalking cases across Rhode Island between 2001 and 2005 with a sample of 160 researcher identified stalking incidents (incidents that met statutory criteria for stalking but were cited by police for other domestic violence offenses) during the same period. The secondary data used for this study came from the Rhode Island Supreme Court Domestic Violence Training and Monitoring Unit's (DVU) statewide database of domestic violence incidents reported to Rhode Island law enforcement. Prior criminal history data were obtained from records of all court cases entered into the automated Rhode Island court file, CourtConnect. The data contain a total of 121 variables including suspect characteristics, victim characteristics, incident characteristics, police response characteristics, and prosecutor response characteristics.

  9. r

    Supplemental Material for PhD Dissertation "Innovative Offsite Construction...

    • research-repository.rmit.edu.au
    • researchdata.edu.au
    docx
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Zolghadr (2023). Supplemental Material for PhD Dissertation "Innovative Offsite Construction Uptake in the Housebuilding Sector: A Systemic Approach to Economic Justifiability for Volume Builders" [Dataset]. http://doi.org/10.25439/rmt.23557272.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    RMIT University
    Authors
    Ali Zolghadr
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This document contains text excerpts captured from the literature as secondary data to develop the qualitative system dynamics model as well as two example coding tables. Table 1 shows the final list of research works selected for model development through a systematic paper selection procedure as described in chapter 3 of the thesis. Table 2 shows the initial causal links created based on the identified casual relationships. Table 3 shows an intermediate merging step (3rd iteration), where causal links are combined into more general links. For a detailed explanation of the model development process refer to chapter 3 of the thesis.

  10. Risk factors for opioid use disorder (weighted).

    • plos.figshare.com
    xls
    Updated Jun 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker (2023). Risk factors for opioid use disorder (weighted). [Dataset]. http://doi.org/10.1371/journal.pone.0236268.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 15, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Risk factors for opioid use disorder (weighted).

  11. n

    Data from: Distinguishing potential bacteria-tumor associations from...

    • data-staging.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Jan 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kelly M. Robinson; Jonathan Crabtree; John S. A. Mattick; Kathleen E. Anderson; Julie C. Dunning Hotopp (2018). Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data [Dataset]. http://doi.org/10.5061/dryad.96584
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 11, 2018
    Dataset provided by
    University of Maryland, Baltimore County
    Authors
    Kelly M. Robinson; Jonathan Crabtree; John S. A. Mattick; Kathleen E. Anderson; Julie C. Dunning Hotopp
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome.

    Results: Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples.

    Conclusions: This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected.

  12. Demographic characteristics of participants who reported the use of opioids...

    • plos.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker (2023). Demographic characteristics of participants who reported the use of opioids with a medical prescription within the last 12 months. [Dataset]. http://doi.org/10.1371/journal.pone.0236268.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Demographic characteristics of participants who reported the use of opioids with a medical prescription within the last 12 months.

  13. NuSTAR Serendipitous Survey 40-Month Secondary Source Catalog - Dataset -...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). NuSTAR Serendipitous Survey 40-Month Secondary Source Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/nustar-serendipitous-survey-40-month-secondary-source-catalog
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This table contains some of the science results from the Nuclear Spectroscopic Telescope Array (NuSTAR) Serendipitous Survey. The catalog incorporates data taken during the first 40 months of NuSTAR operation, which provide ~20 Ms of effective exposure time over 331 fields, with an areal coverage of 13 deg2. The primary catalog (available as the HEASARC NUSTARSSC table) contains 498 sources (the abstract of the reference paper states that there are 497 sources) detected in total over the 3-24 keV energy range. There are 276 sources with spectroscopic redshifts and classifications, largely resulting from the authors' extensive campaign of ground-based spectroscopic follow-up. The authors characterize the overall sample in terms of the X-ray, optical, and infrared source properties. The sample is primarily composed of active galactic nuclei (AGN), detected over a large range in redshift from z = 0.002 to 3.4 (median redshift z of 0.56), but also includes 16 spectroscopically confirmed Galactic sources. There is a large range in X-ray flux, from log (f_3-24_keV) ~ -14 to -11 (in units of erg s-1 cm-2), and in rest-frame 10-40 keV luminosity, from log (L10-40keV) ~ 39 to 46 (in units of erg s-1), with a median of 44.1. Approximately 79% of the NuSTAR sources have lower-energy (<10 keV) X-ray counterparts from XMM-Newton, Chandra, and Swift XRT observations. The mid-infrared (MIR) analysis, using WISE all-sky survey data, shows that MIR AGN color selections miss a large fraction of the NuSTAR-selected AGN population, from ~15% at the highest luminosities (LX > 1044 erg s-1) to ~80% at the lowest luminosities (LX < 1043 erg s-1). The authors' optical spectroscopic analysis finds that the observed fraction of optically obscured AGN (i.e., the type 2 fraction) is FType2 = 53 (+14, -15) per cent, for a well-defined subset of the 8-24 keV selected sample. This is higher, albeit at a low significance level, than the type 2 fraction measured for redshift- and luminosity-matched AGNs selected by < 10 keV X-ray missions. This table contains the Secondary NuSTAR Serendipitous Source Catalog of 64 sources found using wavdetect to search for significant emission peaks in the FPMA and FPMB data separately (see Section 2.1.1 of Alexander et al. 2013, ApJ, 773, 125) and in the combined A+B data. These sources are listed in Table 7 of the reference paper. This method was developed alongside the primary one (Section 2.3 of the reference paper) in order to investigate the optimum source detection methodologies for NuSTAR and to identify sources in regions of the NuSTAR coverage that are automatically excluded in the primary source detection. The authors emphasize that these secondary sources are not used in any of the science analyses presented in their paper. Nevertheless, these secondary sources are robust NuSTAR detections, some of which will be incorporated in future NuSTAR studies, and for many of them (35 out of the 43 sources with spectroscopic identifications) the authors have obtained new spectroscopic redshifts and classifications through their follow-up program. The X-ray photometric parameters for 4 sources are left blank as in these cases the A+B data prohibit reliable photometric constraints. Additional information on these Secondary Catalog sources that the authors obtained using optical spectroscopy is available in Table 8 of the reference paper (q.v.). This table does NOT contain the the 498 sources in the Primary NuSTAR Serendipitous Source Catalog that were found using the source detection procedure described in Section 2.3 of the reference paper, and that are listed in Table 5 (op. cit.). This table was created by the HEASARC in July 2017 based on the machine-readable version of Table 7 from the reference paper, the Secondary NuSTAR Serendipitous Source Catalog, that was obtained from the ApJ web site. This is a service provided by NASA HEASARC .

  14. Weighted % of prescription opioid recipients according to OUD severity.

    • figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker (2023). Weighted % of prescription opioid recipients according to OUD severity. [Dataset]. http://doi.org/10.1371/journal.pone.0236268.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Johannes M. Just; Norbert Scherbaum; Michael Specka; Marie-Therese Puth; Klaus Weckbecker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Weighted % of prescription opioid recipients according to OUD severity.

  15. Make Data Count Dataset - MinerU Extraction

    • kaggle.com
    zip
    Updated Aug 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omid Erfanmanesh (2025). Make Data Count Dataset - MinerU Extraction [Dataset]. https://www.kaggle.com/datasets/omiderfanmanesh/make-data-count-dataset-mineru-extraction
    Explore at:
    zip(4272989320 bytes)Available download formats
    Dataset updated
    Aug 26, 2025
    Authors
    Omid Erfanmanesh
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description

    This dataset contains PDF-to-text conversions of scientific research articles, prepared for the task of data citation mining. The goal is to identify references to research datasets within full-text scientific papers and classify them as Primary (data generated in the study) or Secondary (data reused from external sources).

    The PDF articles were processed using MinerU, which converts scientific PDFs into structured machine-readable formats (JSON, Markdown, images). This ensures participants can access both the raw text and layout information needed for fine-grained information extraction.

    Files and Structure

    Each paper directory contains the following files:

    • *_origin.pdf The original PDF file of the scientific article.

    • *_content_list.json Structured extraction of the PDF content, where each object represents a text or figure element with metadata. Example entry:

      {
       "type": "text",
       "text": "10.1002/2017JC013030",
       "text_level": 1,
       "page_idx": 0
      }
      
    • full.md The complete article content in Markdown format (linearized for easier reading).

    • images/ Folder containing figures and extracted images from the article.

    • layout.json Page layout metadata, including positions of text blocks and images.

    Data Mining Task

    The aim is to detect dataset references in the article text and classify them:

    Each dataset mention must be labeled as:

    • Primary: Data generated by the paper (new experiments, field observations, sequencing runs, etc.).
    • Secondary: Data reused from external repositories or prior studies.

    Training and Test Splits

    • train/ → Articles with gold-standard labels (train_labels.csv).
    • test/ → Articles without labels, used for evaluation.
    • train_labels.csv → Ground truth with:

      • article_id: Research paper DOI.
      • dataset_id: Extracted dataset identifier.
      • type: Citation type (Primary / Secondary).
    • sample_submission.csv → Example submission format.

    Example

    Paper: https://doi.org/10.1098/rspb.2016.1151 Data: https://doi.org/10.5061/dryad.6m3n9 In-text span:

    "The data we used in this publication can be accessed from Dryad at doi:10.5061/dryad.6m3n9." Citation type: Primary

    This dataset enables participants to develop and test NLP systems for:

    • Information extraction (locating dataset mentions).
    • Identifier normalization (mapping mentions to persistent IDs).
    • Citation classification (distinguishing Primary vs Secondary data usage).
  16. Z

    Dataset. Associations Between Severity of Depression, Lifestyle Patterns,...

    • data-staging.niaid.nih.gov
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliván-Blázquez, Bárbara; Aguilar-Latorre, Alejandra (2025). Dataset. Associations Between Severity of Depression, Lifestyle Patterns, and Personal Factors Related to Health Behavior: Secondary Data Analysis From a Randomized Controlled Trial [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14748710
    Explore at:
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Universidad de Zaragoza
    Authors
    Oliván-Blázquez, Bárbara; Aguilar-Latorre, Alejandra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset from Aguilar-Latorre, A., Serrano-Ripoll, M. J., Oliván-Blázquez, B., Gervilla, E., & Navarro, C. (2022). Associations Between Severity of Depression, Lifestyle Patterns, and Personal Factors Related to Health Behavior: Secondary Data Analysis From a Randomized Controlled Trial. Frontiers in psychology, 13, 856139. https://doi.org/10.3389/fpsyg.2022.856139

    Background: Depression is a prevalent condition that has a significant impact on psychosocial functioning and quality of life. The onset and persistence of depression have been linked to a variety of biological and psychosocial variables. Many of these variables are associated with specific lifestyle characteristics, such as physical activity, diet, and sleep patterns. Some psychosocial determinants have an impact on people’ health-related behavior change. These include personal factors such as sense of coherence, patient activation, health literacy, self-efficacy, and procrastination. This study aims to analyze the association between the severity of depression, lifestyle patterns, and personal factors related to health behavior. It also aims to analyze whether personal factors moderate the relationship between lifestyles and depression.

    Methods: This study is a secondary data analysis (SDA) of baseline data collected at the start of a randomized controlled trial (RCT). A sample of 226 patients with subclinical, mild, or moderate depression from primary healthcare centers in two sites in Spain (Zaragoza and Mallorca) was used, and descriptive, bivariate, multivariate, and moderation analyses were performed. Depression was the primary outcome, measured by Beck II Self-Applied Depression Inventory. Lifestyle variables such as physical exercise, adherence to Mediterranean diet and sleep quality, social support, and personal factors such as self-efficacy, patient activation in their own health, sense of coherence, health literacy, and procrastination were considered secondary outcomes.

    Results: Low sense of coherence (β = −0.172; p < 0.001), poor sleep quality (β = 0.179; p = 0.008), low patient activation (β = −0.119; p = 0.019), and sedentarism (more minutes seated per day; β = 0.003; p = 0.025) are predictors of having more depressive symptoms. Moderation analyses were not significant.

    Discussion: Lifestyle and personal factors are related to depressive symptomatology. Our findings reveal that sense of coherence, patient’s activation level, sedentarism, and sleep quality are associated with depression. Further research is needed regarding adherence to Mediterranean diet, minutes walking per week and the interrelationship between lifestyles, personal factors, and depression.

  17. f

    Datasets: Crowd Data Center (CDC)

    • unisa.figshare.com
    xlsx
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lenny Mamaro (2025). Datasets: Crowd Data Center (CDC) [Dataset]. http://doi.org/10.25399/UnisaData.30656291.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    University of South Africa
    Authors
    Lenny Mamaro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Crowd Data Center is an online, international aggregator of openly available data from various global crowdfunding platforms. It provides standardised, structured, and downloadable datasets containing campaign-level information such as project descriptions, funding targets, amounts raised, backer counts, campaign duration, and project categories.The crowd data center is one of the most widely utilized datasets in academia due to its high-quality, cleaned data, which is captured directly from original crowdfunding platforms through automated data extraction protocols. Available at https://thecrowddatacenter.com/Type of Data Collection MethodSecondary Data Collection or Archival Data.Description of the Data Collection MethodThe research design depends on secondary, archival data obtained from the CDC. The data were collected from previously run campaigns recorded across crowdfunding platforms, such as Kickstarter or Indiegogo, and stored within the CDC database.The crowd data center utilises automated web-scraping and API-based data extraction methods for continuous gathering, verification, and updating of data from campaigns. Researchers download datasets from the data centre in CSV or Excel format for analysis. Short Paragraph Example for Your Proposal: The study will utilize secondary data sourced from the Crowd Data Center, an international database that aggregates structured archival data from large crowdfunding platforms. The CDC contains extensive information on campaign characteristics, funding models, creator profiles, and project outcomes. Data presented by the CDC is gathered through automated web scraping and API extraction techniques, ensuring accuracy and comparability across crowdfunding platforms. This approach of collecting secondary data facilitates large-sample empirical analysis and allows one to avoid collecting primary data.

  18. f

    Comprehensive Food Security and Vulnerability Analysis and Nutrition Survey...

    • microdata.fao.org
    Updated Jul 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Statistics of Rwanda (2019). Comprehensive Food Security and Vulnerability Analysis and Nutrition Survey 2009 - Rwanda [Dataset]. https://microdata.fao.org/index.php/catalog/871
    Explore at:
    Dataset updated
    Jul 10, 2019
    Dataset authored and provided by
    National Institute of Statistics of Rwanda
    Time period covered
    2009
    Area covered
    Rwanda
    Description

    Abstract

    As significant progress continues to be made by the Rwandan economy following various recovery and growth strategies, certain elements remain crucial. The food and nutrition security of the population remains a key building block in not only consolidating the gains already made thus far but also further accelerating the rate of growth towards the realization of the Millennium Development Goals (MDGs). Thus, the 2009 Comprehensive Food Security and Vulnerability Analysis and Nutrition survey (CFSVANS) was undertaken with the objective of analyzing trends over time in comparison with the 2006 CFSVA and the 2005 RDHS, as well as, with other more recent secondary data, measuring the extent and depth of food and nutrition insecurity and vulnerability, and identifying the underlying causes.

    The five key questions to a CFSVANS are: who are the people currently facing food insecurity and malnutrition; how many are they; where do they live; why are they food insecure and/or malnourished and; how can food assistance and interventions make a difference in reducing poverty, hunger and supporting livelihoods? In order to provide answers to these questions, specifically, the assessment sought to:

    -Identify geographic and socio-economic groups that are food insecure or vulnerable to food insecurity;
    -highlight the nature and causes of food insecurity among each group; -Identify the major risks and constraints to improving food security; -Evaluate assistance needs at the short, medium and long range; - Support the development of an appropriate targeting system; - Better define the role of GoR's development partners including WFP in promoting food security strengthening programs;
    - Determine the prevalence of nutritional status of vulnerable groups (children aged 6 - 59 months and non-pregnant women of reproductive age (15-49 years old);
    -Determine the prevalence of exclusive breastfeeding as a key Infant and Young Child Feeding strategy; -Establish the linkage between household food security and nutritional status of children in Rwanda.

    Geographic coverage

    National coverage

    Analysis unit

    Households

    Universe

    Rural household members

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Rwanda is administratively divided into four provinces (Northern Province, Southern Province, Eastern Province and Western Province) plus Kigali City and a total of 30 districts. Districts are further divided in sectors and cells. The 2009 Comprehensive Food Security Vulnerability Analysis and Nutrition Survey (CFSVANS) was designed to provide statistically representative information at the sub-provincial level. To facilitate comparison with existing studies, it was decided to define strata using administrative limits rather than food economy zones (as in 2006). Because of the large number of districts, it was decided to define strata that would be either single districts or a group of districts. Districts that were identified as similar with regards to their socio-economic and agroenvironmental characteristics were grouped together. A total of 16 strata were defined including 8 districts and 8 groups of districts. Kigali City was not included in the sample. Selected strata include Nyagatare-Gatsibo-Kayonza, Kirehe-Ngoma-Rwamagana, and Bugesera (Eastern Province), Musanze-Burera, Gakenke, and Rulindo-Gicumbi (Northern Province), Rubavu, Nyabihu, Ngororero, Rutsiro-Karongi, and Nyamasheke-Rusizi (Western Province), and, Kamonyi-Muhanga-Ruhango, Nyanza, Huye, Gisagara, and Nyamagabe-Nyaruguru (Southern Province).

    Within each stratum, NISR implemented a two-stage sampling procedure to select households using an approach that is standardized for statistical studies in Rwanda. Zones de Dénombrement (ZD, enumeration areas) were selected first, followed by households using 2007 population estimates based on the 2002 census. The ZDs are a sampling unit that is smaller than a sector. A total of 450 ZD were selected. In each stratum, the probability of the ZDs to be selected was equal to the number of ZDs in the stratum divided by the number of ZDs. In each stratum, ZDs were randomly selected. Within each sampled ZD, a total of 12 households were interviewed, resulting in a total expected sample size of 5,400 households.

    All of the households were interviewed. Enumerators were provided with clear instructions on which households to interview, and how to find them. Supervisors were provided with a list of over-sampled households in the event that a household had to be replaced.

    Because this study also focuses on the relation between nutrition and food security, it was decided during the study design that only households with children aged below 5 years old would be included in the sample. This imposed some limitations in the ability to draw conclusions about all the households in Rwanda; as explained in the limitations section.

    Mode of data collection

    Face-to-face paper [f2f]

    Research instrument

    Household survey To allow for comparison over time, the 2009 CFSVA and Nutrition Survey used a standard questionnaire similar to the one used for the 2006 CFSVA. In 2006, face validity of the questionnaire was examined by local and food security experts and the questionnaire was piloted among a random sample of people not included in the study. It was a structured questionnaire using mainly close-ended questions with response options provided to the enumerators. For several questions, respondents were allowed to provide more than one response.The survey instrument sought to collect quantitative data on 13 components: (1) demographics; (2) housing and facilities; (3) household and productive assets; (4) inputs to livelihoods; (5) migration and remittances; (6) sources of credit; (7) agricultural production; (8) expenditure; (9) food sources and consumption; (10) shocks and food security; (11) programme participation; (12) maternal health and nutrition; and (13) child health and nutrition.

    Community questionnaire In addition to the household survey, a community questionnaire was administered to a key informant, who was an official representative of the area, including the Executive Secretary of the Cell, or any individual responsible for administrative services at Cell level. The community questionnaire was developed using an approach similar to that of the household questionnaire. Questions were open-ended and the questionnaires covered four main aspects; migration and seasonal movement of population, health, external assistance (food aid), and market prices.

    The questionnaires were developed in English and administered in Kinyarwanda. Careful training was conducted to reduce individual variations on how enumerators interpreted the questionnaire and understood the questions.

    Cleaning operations

    Data entry was conducted by NISR using CSPro. The database was then exported to SPSS for analysis. Statistical analysis was conducted by WFP in Rwanda and Rome, with the support of NISR. SPSS and ADDAWIN were used to conduct PCA and cluster analysis.5 Z-scores for wasting, stunting and underweight were calculated using WHO Anthro. All other analyses were done using SPSS.

    Data appraisal

    A series of data quality tables and graphs are available to review the quality of the data and include the following: -Food Items, Groups and Weights for Calculation of the FCS -Household characteristics associated with food consumption -Child nutrition by livelihood, wealth index and FCS -The people facing food insecurity and vulnerability -Sample and Demographic Characteristics by Strata (CFSVA 2009)....

  19. 1968-98 Civil Rights Data Collection

    • datalumos.org
    Updated Feb 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Education. Office for Civil Rights (2025). 1968-98 Civil Rights Data Collection [Dataset]. http://doi.org/10.3886/E219621V1
    Explore at:
    Dataset updated
    Feb 15, 2025
    Dataset provided by
    United States Department of Educationhttps://ed.gov/
    Authors
    United States Department of Education. Office for Civil Rights
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1968 - 1998
    Area covered
    United States
    Description

    The Civil Rights Data Collection (CRDC), formerly administered as the Elementary and Secondary School Civil Rights Survey, is an important part of the U.S. Department of Education's (Department) Office for Civil Rights (OCR) strategy for administering and enforcing civil rights laws in the nation’s public school districts and schools. The CRDC collects a variety of information including student access to rigorous courses, programs, resources, instructional and other school staff, and school climate factors such as student discipline and harassment and bullying. Much of the data is disaggregated by race/ethnicity, sex, disability and whether students are English Learners.Since the 2011–12 school year, OCR has collected data from all public districts and their schools in the 50 states and Washington, DC. Over time the CRDC’s collection universe has grown to include long-term secure justice facilities, charter schools, alternative schools, and special education schools that focus primarily on serving students with disabilities. OCR added the Commonwealth of Puerto Rico to the CRDC, beginning with the 2017-18 CRDC. From 1968 to 2010, civil rights data were collected from a sample of public districts and their schools, except for the 1976 and 2000 collections, which included data from all public schools and districts.The purpose of the CRDC Archival Download Tool (Archival Tool) is to make the Department’s civil rights data from 1968 to 1998 publicly available. The Archival Tool organizes civil rights data by year, and provides users with access to the data, survey forms, and other relevant documentation. The tool also includes documentation on key historical CRDC data changes from 1968 to 1998. Users may extract district-level civil rights data.Important Consideration: Past collections and publicly released reports may contain some terms that readers may consider obsolete, offensive and/or inappropriate. As part of the Department’s goal to be open and transparent with the public, we are providing access to all civil rights data in its original format.Privacy notice:The Department of Education’s Disclosure Review Board determined that the CRDC files for 1968-1998 are safe for public “re-release” under the Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. § 1232g; 34 CFR Part 99).

  20. a

    Early Childhood Longitudinal Study, Birth Cohort

    • atlaslongitudinaldatasets.ac.uk
    url
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for Education Statistics (NCES) (2025). Early Childhood Longitudinal Study, Birth Cohort [Dataset]. https://atlaslongitudinaldatasets.ac.uk/datasets/ecls-b
    Explore at:
    urlAvailable download formats
    Dataset updated
    Feb 7, 2025
    Dataset provided by
    Atlas of Longitudinal Datasets
    Authors
    National Center for Education Statistics (NCES)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2001
    Area covered
    United States of America
    Variables measured
    None
    Measurement technique
    Secondary data, None, Cohort - birth, Interview – face-to-face, Physical or biological assessment (e.g. blood, saliva, gait, grip strength, anthropometry), Birth records
    Dataset funded by
    Department of Education
    Description

    ECLS-B is a longitudinal study that followed a nationally representative sample of approximately 10,700 participating children from birth through kindergarten entry. The children participating in the study were born in the United States in 2001, and came from diverse socioeconomic and racial/ethnic backgrounds, with over-samples of Chinese children, other Asian and Pacific Islander children, American Indian and Alaska Native children, twins, and children born with low and very low birth weight.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola (2024). Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration [Dataset]. http://doi.org/10.5281/zenodo.13767153
Organization logo

Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration

Explore at:
json, text/x-python, bin, txtAvailable download formats
Dataset updated
Sep 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Per Erik Strandberg; Per Erik Strandberg; Philipp Peterseil; Philipp Peterseil; Julian Karoliny; Julian Karoliny; Johanna Kallio; Johanna Kallio; Johannes Peltola; Johannes Peltola
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Secondary Data from Insights from Publishing Open Data in Industry-Academia Collaboration

Authors

Per Erik Strandberg [1], Philipp Peterseil [2], Julian Karoliny [3], Johanna Kallio [4], and Johannes Peltola [4].

[1] Westermo Network Technologies AB (Sweden).
[2] Johannes Kepler University Linz (Austria)
[3] Silicon Austria Labs GmbH (Austria).
[4] VTT Technical Research Centre of Finland Ltd. (Finland).

Description

This data is to accompany a paper submitted to Elsevier's data in brief in 2024, with the title Insights from Publishing Open Data in Industry-Academia Collaboration.

Tentative Abstract: Effective data management and sharing are critical success factors in industry-academia collaboration. This paper explores the motivations and lessons learned from publishing open data sets in such collaborations. Through a survey of participants in a European research project that published 13 data sets, and an analysis of metadata from almost 281 thousand datasets in Zenodo, we collected qualitative and quantitative results on motivations, achievements, research questions, licences and file types. Through inductive reasoning and statistical analysis we found that planning the data collection is essential, and that only few datasets (2.4%) had accompanying scripts for improved reuse. We also found that authors are not well aware of the importance of licences or which licence to choose. Finally, we found that data with a synthetic origin, collected with simulations and potentially mixed with real measurements, can be very meaningful, as predicted by Gartner and illustrated by many datasets collected in our research project.

Secondary data from Survey

The file survey.txt contains secondary data from a survey of participants that published open data sets in the 3-year European research project InSecTT.

Secondary data from Zenodo

The file secondary_data_zenodo.json contains secondary data from an analysis of data sets published in Zenodo. It is accompanied with a py-file and a ipynb-file to serve as examples.

License

This data is licenced with the Creative Commons Attribution 4.0 International license. You are free to use the data if you attribute the authors. Read the license text for details.

Search
Clear search
Close search
Google apps
Main menu