81 datasets found
  1. d

    Current Population Survey (CPS)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Damico, Anthony
    Description

    analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

  2. h

    NATCOOP dataset

    • heidata.uni-heidelberg.de
    csv, docx, pdf, tsv +1
    Updated Jan 27, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess (2022). NATCOOP dataset [Dataset]. http://doi.org/10.11588/DATA/GV8NBL
    Explore at:
    docx(90179), pdf(432619), csv(3441765), docx(499022), tsv(86553), pdf(473493), pdf(856157), pdf(467245), docx(101203), pdf(351653), pdf(576588), pdf(200225), pdf(124038), type/x-r-syntax(14339), pdf(345323), pdf(69467), docx(43108), pdf(268168), docx(493800), docx(25110), docx(43036), pdf(270379), pdf(77960), pdf(464499), pdf(392748), docx(42158), pdf(374488), docx(498354), pdf(282466), pdf(482954), pdf(302513), pdf(513748), pdf(126342), docx(33772), tsv(2313475), pdf(441389), pdf(92836), pdf(392718)Available download formats
    Dataset updated
    Jan 27, 2022
    Dataset provided by
    heiDATA
    Authors
    Florian Diekert; Florian Diekert; Robbert-Jan Schaap; Robbert-Jan Schaap; Tillmann Eymess; Tillmann Eymess
    License

    https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBLhttps://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.11588/DATA/GV8NBL

    Time period covered
    Jan 1, 2017 - Jan 1, 2021
    Dataset funded by
    European Commission
    Description

    The NATCOOP project set out to study how nature shapes the preferences and incentives of economic agents and how this in turn affects common-pool resource management. Imagine a group of fishermen targeting a species that requires a lot of teamwork to harvest. Do these fishers become more social over time compared to fishers that work in a more solitary manner? If so, does this have implications for how the fishery should be managed? To study this, the NATCOOP team travelled to Chile and Tanzania and collected data using surveys and economic experiments. These two very different countries have a large population of small-scale fishermen, and both host several distinct types of fisheries. Over the course of five field trips, the project team surveyed more than 2500 fishermen with each field trip contributing to the main research question by measuring fishermen’s preferences for cooperation and risk. Additionally, each fieldtrip aimed to answer another smaller research question that was either focused on risk taking or cooperation behavior in the fisheries. The data from both surveys and experiments are now publicly available and can be freely studied by other researchers, resource managers, or interested citizens. Overall, the NATCOOP dataset contains participants’ responses to a plethora of survey questions and their actions during incentivized economic experiments. It is available in both the .dta and .csv format, and its use is recommended with statistical software such as R or Stata. For those unaccustomed with statistical analysis, we included a video tutorial on how to use the data set in the open-source program R.

  3. f

    Data.Evaluation Report 9-months pilot Open Science Support Desk

    • uvaauas.figshare.com
    • figshare.com
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    G. ter Riet; N.R. van Ulzen; F.A. van Nes (2023). Data.Evaluation Report 9-months pilot Open Science Support Desk [Dataset]. http://doi.org/10.21943/auas.13614689.v1
    Explore at:
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    G. ter Riet; N.R. van Ulzen; F.A. van Nes
    License

    http://rdm.uva.nl/en/support/confidential-data.htmlhttp://rdm.uva.nl/en/support/confidential-data.html

    Description

    Datasets related to the evaluation and report of the Urban Vitality (UV) Open science support desk. Data were exported from Qualtrics and saved as STATA (.dta) files and analyzed using STATA version 13.1. This item contains:1. Qualtrics-exports: two tab-separated value (.tsv) files2. STATA: two STATA data (.dta) files3. STATA: three STATA log (.txt) filesThe STATA analysis files are deposited in UvA/HvA figshare separately and are publcily available. More information is available in the report.

  4. o

    Data and Code for: An Empirical Evaluation of Chinese College Admissions...

    • openicpsr.org
    stata
    Updated Sep 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Chen; Ming Jiang; Onur Kesten (2020). Data and Code for: An Empirical Evaluation of Chinese College Admissions Reforms Through A Natural Experiment [Dataset]. http://doi.org/10.3886/E121101V2
    Explore at:
    stataAvailable download formats
    Dataset updated
    Sep 7, 2020
    Dataset provided by
    University of Sydney
    University of Michigan
    Shanghai Jiao Tong University
    Authors
    Yan Chen; Ming Jiang; Onur Kesten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2008 - 2009
    Area covered
    China
    Description

    This repository contains datasets and analysis code accompanying the paper "An Empirical Evaluation of Chinese College Admissions Reforms Through A Natural Experiment" by Chen, Jiang, and Kesten. The datasets contain the college admission data for a county in China's Sichuan Province for year 2008 and 2009. These include students' submitted rank-ordered lists of colleges and admission results. All variables are recoded to remove any identifiable information (including college and high school code). The analysis code can be used to replicate the tables and figures in the paper.

  5. H

    Data from: Clear as Black and White: The Effects of Ambiguous Rhetoric...

    • dataverse.harvard.edu
    application/x-stata +1
    Updated Jun 12, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2017). Clear as Black and White: The Effects of Ambiguous Rhetoric Depend on Candidate Race [Dataset]. http://doi.org/10.7910/DVN/JSDUTQ
    Explore at:
    application/x-stata-syntax(4597), application/x-stata-syntax(3287), application/x-stata(80379)Available download formats
    Dataset updated
    Jun 12, 2017
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    To replicate our analyses, open the Stata dataset. Then run the "constructing variables" do-file, which makes the variables we use in the analyses. Then run the "replication analyses" do-file, which also notes which analyses go with which table or figure.

  6. P

    Replication Data for: Housing Price and Talent Allocation: From the...

    • opendata.pku.edu.cn
    application/x-stata +1
    Updated Aug 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2019). Replication Data for: Housing Price and Talent Allocation: From the Perspective of Occupation Choice Between Public and Private Sectors [Dataset]. http://doi.org/10.18170/DVN/KW1AMR
    Explore at:
    application/x-stata-syntax(28325), application/x-stata(3011411)Available download formats
    Dataset updated
    Aug 28, 2019
    Dataset provided by
    Peking University Open Research Data Platform
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Using the stata program written by myself to clean the data of the China Labor Dynamics Survey (CLDS) in 2012, 2014 and 2016, after pre-processing the original data, and second encoding according to the employment selection data, the data merges with the prefecture-level city prices in 2001-2013and other data at the city level to form a database of city-individual career choices. The format is Stata Dataset (.dta) and needs to be opened with stata.

  7. The Canada Trademarks Dataset

    • zenodo.org
    pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jeremy Sheff; Jeremy Sheff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Canada Trademarks Dataset

    18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

    Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

    Python and Stata Scripts (c) 2021 Jeremy Sheff

    Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

    This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

    Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

    Terms of Use:

    As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

    The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

    The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

    Details of Repository Contents:

    This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

    • /csv: contains the .csv versions of the data files
    • /do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset
    • /dta: contains the .dta versions of the data files
    • /py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

    If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

    The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

    With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

    The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

    This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.

  8. Repeated information of benefits reduce COVID-19 vaccination hesitancy:...

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +1more
    zip
    Updated Jun 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Max Burger; Max Burger; Matthias Mayer; Matthias Mayer; Ivo Steimanis; Ivo Steimanis (2022). Repeated information of benefits reduce COVID-19 vaccination hesitancy: Experimental evidence from Germany [Dataset]. http://doi.org/10.5281/zenodo.6242620
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 17, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Max Burger; Max Burger; Matthias Mayer; Matthias Mayer; Ivo Steimanis; Ivo Steimanis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    This replication package contains the raw data and code to replicate the findings reported in the paper. The data are licensed under a Creative Commons Attribution 4.0 International Public License. The code is licensed under a Modified BSD License. See LICENSE.txt for details.

    Software requirements

    All analysis were done in Stata version 16:

    • Add-on packages are included in scripts/libraries/stata and do not need to be installed by user. The names, installation sources, and installation dates of these packages are available in scripts/libraries/stata/stata.trk.

    Instructions

    1. Save the folder ‘replication_PLOS’ to your local drive.
    2. Open the master script ‘run.do’ and change the global pointing to the working direction (line 20) to the location where you save the folder on your local drive
    3. Run the master script ‘run.do’ to replicate the analysis and generate all tables and figures reported in the paper and supplementary online materials

    Datasets

    • Wave 1 – Survey experiment: ‘wave1_survey_experiment_raw.dta’
    • Wave 2 – Follow-up Survey: ‘wave2_follow_up_raw.dta'
    • Map: shape-files ‘plz2stellig.shp’ ‘OSM_PLZ.shp’, area codes ‘Postleitzahlengebiete-_OSM.csv’_, (all links to the sources can be found in the script ‘04_figure2_germany_map.do’)
    • Pretest: ‘pre-test_corona_raw.dta’
    • For Appendix S7: ‘alter_geschlecht_zensus_det.xlsx’, ‘vaccination_landkreis_raw.dta’, ‘census2020_age_gender.csv’ (all links to the sources can be found in the script ‘06_AppendixS7.do’)
    • For Appendix S10: ‘vaccination_landkreis_raw.dta’ (all links to the sources can be found in the script ‘07_AppendixS10.do’)

    Descriptions of scripts

    1_1_clean_wave1.do
    This script processes the raw data from wave 1, the survey experiment
    1_2_clean_wave2.do
    This script processes the raw data from wave 2, the follow-up survey
    1_3_merge_generate.do
    This script creates the datasets used in the main analysis and for robustness checks by merging the cleaned data from wave 1 and 2, tests the exclusion criteria and creates additional variables
    02_analysis.do
    This script estimates regression models in Stata, creates figures and tables, saving them to results/figures and results/tables
    03_robustness_checks_no_exclusion.do
    This script runs the main analysis using the dataset without applying the exclusion criteria. Results are saved in results/tables
    04_figure2_germany_map.do
    This script creates Figure 2 in the main manuscript using publicly available data on vaccination numbers in Germany.
    05_figureS1_dogmatism_scale.do
    This script creates Figure S1 using data from a pretest to adjust the dogmatism scale.
    06_AppendixS7.do
    This script creates the figures and tables provided in Appendix S7 on the representativity of our sample compared to the German average using publicly available data about the age distribution in Germany.
    07_AppendixS10.do
    This script creates the figures and tables provided in Appendix S10 on the external validity of vaccination rates in our sample using publicly available data on vaccination numbers in Germany.

  9. w

    Handbook on Impact Evaluation: Quantitative Methods and Practices -...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Nov 20, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S. Khandker, G. Koolwal and H. Samad (2013). Handbook on Impact Evaluation: Quantitative Methods and Practices - Exercises 2009 - Bangladesh [Dataset]. https://microdata.worldbank.org/index.php/catalog/436
    Explore at:
    Dataset updated
    Nov 20, 2013
    Dataset authored and provided by
    S. Khandker, G. Koolwal and H. Samad
    Time period covered
    2009
    Area covered
    Bangladesh
    Description

    Abstract

    This exercise dataset was created for researchers interested in learning how to use the models described in the "Handbook on Impact Evaluation: Quantitative Methods and Practices" by S. Khandker, G. Koolwal and H. Samad, World Bank, October 2009 (permanent URL http://go.worldbank.org/FE8098BI60).

    Public programs are designed to reach certain goals and beneficiaries. Methods to understand whether such programs actually work, as well as the level and nature of impacts on intended beneficiaries, are main themes of this book. Has the Grameen Bank, for example, succeeded in lowering consumption poverty among the rural poor in Bangladesh? Can conditional cash transfer programs in Mexico and Latin America improve health and schooling outcomes for poor women and children? Does a new road actually raise welfare in a remote area in Tanzania, or is it a "highway to nowhere?"

    This handbook reviews quantitative methods and models of impact evaluation. It begings by reviewing the basic issues pertaining to an evaluation of an intervention to reach certain targets and goals. It then focuses on the experimental design of an impact evaluation, highlighting its strengths and shortcomings, followed by discussions on various non-experimental methods. The authors also cover methods to shed light on the nature and mechanisms by which different participants are benefiting from the program.

    The handbook provides STATA exercises in the context of evaluating major microcredit programs in Bangladesh, such as the Grameen Bank. This dataset provides both the related Stata data files and the Stata programs.

  10. NYS NYSERDA Low-to-Moderate-Income Census Populat

    • kaggle.com
    zip
    Updated Jan 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2021). NYS NYSERDA Low-to-Moderate-Income Census Populat [Dataset]. https://www.kaggle.com/datasets/new-york-state/nys-nyserda-low-to-moderate-income-census-populat/discussion
    Explore at:
    zip(5712993 bytes)Available download formats
    Dataset updated
    Jan 1, 2021
    Dataset authored and provided by
    State of New York
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    Content

    How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.

    The Low- to Moderate-Income (LMI) New York State (NYS) Census Population Analysis dataset is resultant from the LMI market database designed by APPRISE as part of the NYSERDA LMI Market Characterization Study (https://www.nyserda.ny.gov/lmi-tool). All data are derived from the U.S. Census Bureau’s American Community Survey (ACS) 1-year Public Use Microdata Sample (PUMS) files for 2013, 2014, and 2015.

    Each row in the LMI dataset is an individual record for a household that responded to the survey and each column is a variable of interest for analyzing the low- to moderate-income population.

    The LMI dataset includes: county/county group, households with elderly, households with children, economic development region, income groups, percent of poverty level, low- to moderate-income groups, household type, non-elderly disabled indicator, race/ethnicity, linguistic isolation, housing unit type, owner-renter status, main heating fuel type, home energy payment method, housing vintage, LMI study region, LMI population segment, mortgage indicator, time in home, head of household education level, head of household age, and household weight.

    The LMI NYS Census Population Analysis dataset is intended for users who want to explore the underlying data that supports the LMI Analysis Tool. The majority of those interested in LMI statistics and generating custom charts should use the interactive LMI Analysis Tool at https://www.nyserda.ny.gov/lmi-tool. This underlying LMI dataset is intended for users with experience working with survey data files and producing weighted survey estimates using statistical software packages (such as SAS, SPSS, or Stata).

    Context

    This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!

    • Update Frequency: This dataset is updated annually.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    Cover photo by rawpixel on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  11. H

    Replication Data for: Compulsory Voting and Voter Information Seeking

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Nov 20, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane Singh (2017). Replication Data for: Compulsory Voting and Voter Information Seeking [Dataset]. http://doi.org/10.7910/DVN/DZT7ZR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Shane Singh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are two files needed to replicate the analyses described and depicted in “Compulsory Voting and Voter Information Seeking,” by Shane P. Singh and Jason Roy, and its associated supplemental material. 1. The data, included in Stata format as “replication data, R&P.dta” 2. The Stata code, in a do-file, included as “replication code, R&P.do” To proceed with the replication, open the data in Stata. Then, open the do-file. The code can be run directly from the do-file. The do-file indicates which analyses correspond to the figures in the manuscript and the appendix. All models were estimated in Stata 13.

  12. Effects of community management on user activity in online communities

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alberto Cottica; Alberto Cottica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

    Instructions:

    1. Unzip the files.
    2. Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.
    3. Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.
    4. Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file
    5. Launch Stata and open each flat csv files with it, then save it in Stata format.
    6. Use the provided Stata .do scripts to replicate results.

    Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.

  13. u

    CAP-2030 Nepal: Open Street Map tracker mapping dataset

    • rdr.ucl.ac.uk
    bin
    Updated Feb 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naomi Saville (2023). CAP-2030 Nepal: Open Street Map tracker mapping dataset [Dataset]. http://doi.org/10.5522/04/22109690.v2
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 27, 2023
    Dataset provided by
    University College London
    Authors
    Naomi Saville
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Nepal
    Description

    The Stata data file "jumla_kavre_osmtracker_merged.dta” and equivalent excel file of the same name comprises data on water, waste management and landmarks collected by adolescent secondary school students during a "Citizen Science" project in the district of Kavre in the central hills of Nepal during April 2022 and in the district of Jumla in the remote mountains of West Nepal during June 2022. The project was part of a CIFF-funded Children in All Policies 2030 (CAP2030) project.

    The data were generated by the students using an open access data collection and mapping application called Open Street Map (OSM) tracker, which had been adapted with Nepali language prompts by Researchers from Kathmandu Living Labs (KLL). Researchers from KLL and University College London (UCL) trained the adolescents to record tracks and way points of certain types of information including categories of waste management (rubbish dumps/bins), water sources and public amenities. The resulting datafile is a summary of the data collected showing the latitude/longitude, name, and category of the type of location and the district. The app and the process of gathering the data are described in a paper entitled "Citizen science for climate change resilience: engaging adolescents to study climate hazards, biodiversity and nutrition in rural Nepal" submitted to Wellcome Open Research in Feb 2023. The data contributed to Table 5, and Figure 4 of this paper.

  14. z

    Galvanising the Open Access Community: A Study on the Impact of Plan S -...

    • zenodo.org
    bin, csv
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    W. Benedikt Schmal; W. Benedikt Schmal (2024). Galvanising the Open Access Community: A Study on the Impact of Plan S - Data and Code [Dataset]. http://doi.org/10.5281/zenodo.12523229
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Scidecode
    Authors
    W. Benedikt Schmal; W. Benedikt Schmal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the datasets and code underpinning Chapter 3 "Counterfactual Impact Evaluation of Plan S" of the report "Galvanising the Open Access Community: A Study on the Impact of Plan S" commissioned by the cOAlition S to scidecode science consulting.

    Two categories of files are part of this repository:

    1. Datasets

    The 21 CSV source files contain the subsets of publications funded by the funding agencies that are part of this study. These files have been provided by OA.Works, with whom scidecode has collaborated for the data collection process. Data sources and collection and processing workflows applied by OA.Works are described on their website and specifically at https://about.oa.report/docs/data.

    The file "plan_s.dta" is the aggregated data file stored in the format ".dta", which can be accessed with STATA by default or with plenty of programming languages using the respective packages, e.g., R or Python.

    2. Code files

    The associated code files that have been used to process the data files are:

     - data_prep_and_analysis_script.do
    - coef_plots_script.R

    The first file has been used to process the CSV data files above for data preparation and analysis purposes. Here, data aggregation and data preprocessing is executed. Furthermore, all statistical regressions for the ounterfactual impact evaluation are listed in this code file. The second code file "coef_plots_script.R" uses the computed results of the counterfactual impact evaluation to create the final graphic plots using the ggplot2 package.

    The first ".do" file has to be run in STATA, the second one (".R") requires the use of an integrated development environment for R.

    Further Information are avilable in the final report and via the followng URLs:
    https://www.coalition-s.org/
    https://scidecode.com/
    https://oa.works/
    https://openalex.org/
    https://sites.google.com/view/wbschmal
  15. r

    Dataset on communal popular initiatives in Germany for the year 2018

    • researchdata.se
    • su.figshare.com
    Updated May 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christophe Premat (2022). Dataset on communal popular initiatives in Germany for the year 2018 [Dataset]. http://doi.org/10.17045/STHLMUNI.9601187
    Explore at:
    Dataset updated
    May 4, 2022
    Dataset provided by
    Stockholm University
    Authors
    Christophe Premat
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Germany
    Description

    The dataset is created for the year 2018 in Germany for 267 popular initiatives at the local level where the information was collected regarding the following variables: topic of initiative, State legislation, turnout, validity of the initiative, success of the popular initiative, proportion of yes voters in case of a referendum, size of municipality, profile of initiators (local council or citizens), proportion of no voters, index of mobilization, repartition index (difference between yes and no voters), approval rate, number of inhabitants, correction of initiative, status of the case (open/closed).

    The dataset was created with the help of the existing popular initiatives registered by the University of Wuppertal and the association Mehr Demokratie in Germany. The idea of the dataset is to evaluate in details on which factors the success of popular initiatives depend in the different German States (Länder). A repartition index (difference between yes and no voters) and a mobilization index (repartition index multiplied by the turnout) were calculated and added in the dataset. All the other variables were also created in order to balance the result of these initiatives. The final aim is to be able to measure how direct democratic tools influence local politics in Germany. This is why it is important to examine the prevailing factors for the satisfaction of citizens who use these procedures. In this dataset, the destiny of an initiative (failure/success) can be taken as the dependent variable and all the others could be classified as independent variables.Direct democracy offers possibilities for citizens to influence political decisions especially at the local level. In Germany, the local political systems have been affected by the introduction of direct democratic tools such as citizen initiatives and local referenda since the Reunification. The State legislations defined new conditions for citizen initiatives and municipal referenda with a minimum number of valid signatures for initiatives and a minimum approval rate for the referenda. In the attached file, you will find the dataset in Excel file as well as a .dta file that you can open with the software Stata (https://www.stata.com/).

  16. Framingham heart study dataset

    • kaggle.com
    zip
    Updated Apr 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashish Bhardwaj (2022). Framingham heart study dataset [Dataset]. https://www.kaggle.com/datasets/aasheesh200/framingham-heart-study-dataset
    Explore at:
    zip(59440 bytes)Available download formats
    Dataset updated
    Apr 19, 2022
    Authors
    Ashish Bhardwaj
    Area covered
    Framingham
    Description

    The "Framingham" heart disease dataset includes over 4,240 records,16 columns and 15 attributes. The goal of the dataset is to predict whether the patient has 10-year risk of future (CHD) coronary heart disease

  17. H

    Replication Data for: Issue Framing and Beliefs About the Importance of...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jun 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane Singh (2017). Replication Data for: Issue Framing and Beliefs About the Importance of Climate Change Policy [Dataset]. http://doi.org/10.7910/DVN/F7FPGK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 1, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Shane Singh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are two files needed to replicate the analyses described and depicted in “Issue Framing and Beliefs About the Importance of Climate Change Policy,” by Shane P. Singh and Meili Swanson: (1) The data, included in Stata format as “Survey_Final”; (2) The Stata code, in a do-file, included as “replication.” To proceed with the replication, open the data in Stata. Then, open the do-file. The code can be run directly from that file. All models were estimated in Stata 13.

  18. H

    Replication Data for: Partisanship, Militarized International Conflict, and...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 25, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane Singh (2017). Replication Data for: Partisanship, Militarized International Conflict, and Electoral Support for the Incumbent [Dataset]. http://doi.org/10.7910/DVN/SVMZKS
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 25, 2017
    Dataset provided by
    Harvard Dataverse
    Authors
    Shane Singh
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are two files needed to replicate the analyses described and depicted in “Partisanship, Militarized International Conflict, and Electoral Support for the Incumbent,” by Shane P. Singh and Jaroslav Tir: (1) The data, included in Stata format as “CSES Modules I, II, and III for Singh and Tir PRQ Replication”; (2) The Stata code, in a do-file, included as “Replication Code for Partisanship, Militarized International Conflict, and Electoral Support for the Incumbent, Singh and Tir PRQ.” To proceed with the replication, open the data in Stata. Then, open the do-file. The code can be run directly from that file. All models were estimated in Stata 13.

  19. f

    Baseline clinical and treatment status of adult HIV patients on ART before...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Mar 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abera Gezume Ganta; Ermias Wabeto; Worku Mimani Minuta; Chala Wegi; Tezera Berheto; Serawit Samuel; Desalegn Dawit Assele (2024). Baseline clinical and treatment status of adult HIV patients on ART before and after treat all strategies in Hawassa City between Jan.2012 and Dec.2021. [Dataset]. http://doi.org/10.1371/journal.pone.0299505.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Mar 14, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Abera Gezume Ganta; Ermias Wabeto; Worku Mimani Minuta; Chala Wegi; Tezera Berheto; Serawit Samuel; Desalegn Dawit Assele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hawassa
    Description

    Baseline clinical and treatment status of adult HIV patients on ART before and after treat all strategies in Hawassa City between Jan.2012 and Dec.2021.

  20. d

    Data from: Bullying and Violence on the School Bus: A Mixed-Methods...

    • catalog.data.gov
    • s.cnmilf.com
    • +3more
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Bullying and Violence on the School Bus: A Mixed-Methods Assessment of Behavioral Management Strategies, United States, 2016-2018 [Dataset]. https://catalog.data.gov/dataset/bullying-and-violence-on-the-school-bus-a-mixed-methods-assessment-of-behavioral-mana-2016-a2e15
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    United States
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme files for a brief dscription of the files available with this collection and consult the investigator(s) if further information is needed. The qualitative data are not available as part of the data collection at this time. Numerous high-profile events involving student victimization on school buses have raised critical questions regarding the safety of school-based transportation for children, the efforts taken by school districts to protect students on buses, and the most effective transportation-based behavioral management strategies for reducing misconduct. To address these questions, a national web-based survey was administered to public school district-level transportation officials throughout the United States to assess the prevalence of misconduct on buses, identify strategies to address misconduct, and describe effective ways to reduce student misbehavior on buses. Telephone interviews were also conducted with a small group of transportation officials to understand the challenges of transportation-based behavioral management, to determine successful strategies to create safe and positive school bus environments, and to identify data-driven approaches for tracking and assessing disciplinary referrals. The collection includes 10 Stata data files: BVSBS_analysis file.dta (n=2,595; 1058 variables) Title Crosswalk File.dta (n=2,594; 3 variables) Lessons Learned and Open Dummies.dta (n=1,543; 200 variables) CCD dataset.dta (n=12,494; 89 variables) BVSB_REGION.dta (n=4; 3 variables) BVSB_SCHOOLS.dta (n=3; 3 variables) BVSB_STUDENTS.dta (n=3; 3 variables) BVSB_URBAN.dta (n=8; 3 variables) BVSB_WHITE.dta (n=3; 3 variables) FINALRAKER.dta (n=2,595; 2 variables)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Damico, Anthony (2023). Current Population Survey (CPS) [Dataset]. http://doi.org/10.7910/DVN/AK4FDD

Current Population Survey (CPS)

Explore at:
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description

analyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D

Search
Clear search
Close search
Google apps
Main menu