55 datasets found

g
Downloadstatistik GESIS Datenarchiv
search.gesis.org
da-ra.de
Updated Feb 14, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GESIS - Data Archive for the Social Sciences (2019). Downloadstatistik GESIS Datenarchiv [Dataset]. http://doi.org/10.4232/1.13222
Explore at:
application/x-spss-sav(2154811), application/x-stata-dta(5384365), (2139418), application/x-spss-sav(2295631), (2051697)Available download formats
Unique identifier
https://doi.org/10.4232/1.13222
Dataset updated
Feb 14, 2019
Dataset provided by
GESIS Data Archive
GESIS search
Authors
GESIS - Data Archive for the Social Sciences
License
https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms
Time period covered
Jan 1, 2004 - Dec 31, 2018
Variables measured
za_nr - Archive study number, doi - Digital Object Identifier, version - GESIS Archive Version, Access - Access category (0, A, B, C, D, E), Title - English study title (if n.a., German title), Title_DE - German study title (if n.a., English title), Total - All downloads combined (all years, all sources), d_2004_dbk - All DBK downloads from that respective year, d_2005_dbk - All DBK downloads from that respective year, d_2006_dbk - All DBK downloads from that respective year, and 63 more
Description
General information: The data sets contain information on how often materials of studies available through GESIS: Data Archive for the Social Sciences were downloaded and/or ordered through one of the archive´s plattforms/services between 2004 and 2018.

Sources and plattforms: Study materials are accessible through various GESIS plattforms and services: Data Catalogue (DBK), histat, datorium, data service (and others).

Years available: - Data Catalogue: 2012-2018 - data service: 2006-2018 - datorium: 2014-2018 - histat: 2004-2018

Data sets: Data set ZA6899_Datasets_only_all_sources contains information on how often data files such as those with dta- (Stata) or sav- (SPSS) extension have been downloaded. Identification of data files is handled semi-automatically (depending on the plattform/serice). Multiple downloads of one file by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Data set ZA6899_Doc_and_Data_all_sources contains information on how often study materials have been downloaded. Multiple downloads of any file of the same study by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Both data sets are available in three formats: csv (quoted, semicolon-separated), dta (Stata v13, labeled) and sav (SPSS, labeled). All formats contain identical information.

Variables: Variables/columns in both data sets are identical. za_nr ´Archive study number´ version ´GESIS Archiv Version´ doi ´Digital Object Identifier´ StudyNo ´Study number of respective study´ Title ´English study title´ Title_DE ´German study title´ Access ´Access category (0, A, B, C, D, E)´ PubYear ´Publication year of last version of the study´ inZACAT ´Study is currently also available via ZACAT´ inHISTAT ´Study is currently also available via HISTAT´ inDownloads ´There are currently data files available for download for this study in DBK or datorium´ Total ´All downloads combined´ downloads_2004 ´downloads/orders from all sources combined in 2004´ [up to ...] downloads_2018 ´downloads/orders from all sources combined in 2018´ d_2004_dbk ´downloads from source dbk in 2004´ [up to ...] d_2018_dbk ´downloads from source dbk in 2018´ d_2004_histat ´downloads from source histat in 2004´ [up to ...] d_2018_histat ´downloads from source histat in 2018´ d_2004_dataservice ´downloads/orders from source dataservice in 2004´ [up to ...] d_2018_dataservice ´downloads/orders from source dataservice in 2018´

More information is available within the codebook.
The Canada Trademarks Dataset
zenodo.org
pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy Sheff; Jeremy Sheff (2024). The Canada Trademarks Dataset [Dataset]. http://doi.org/10.5281/zenodo.4999655
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4999655
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jeremy Sheff; Jeremy Sheff
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
The Canada Trademarks Dataset

18 Journal of Empirical Legal Studies 908 (2021), prepublication draft available at https://papers.ssrn.com/abstract=3782655, published version available at https://onlinelibrary.wiley.com/share/author/CHG3HC6GTFMMRU8UJFRR?target=10.1111/jels.12303

Dataset Selection and Arrangement (c) 2021 Jeremy Sheff

Python and Stata Scripts (c) 2021 Jeremy Sheff

Contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office.

This individual-application-level dataset includes records of all applications for registered trademarks in Canada since approximately 1980, and of many preserved applications and registrations dating back to the beginning of Canada’s trademark registry in 1865, totaling over 1.6 million application records. It includes comprehensive bibliographic and lifecycle data; trademark characteristics; goods and services claims; identification of applicants, attorneys, and other interested parties (including address data); detailed prosecution history event data; and data on application, registration, and use claims in countries other than Canada. The dataset has been constructed from public records made available by the Canadian Intellectual Property Office. Both the dataset and the code used to build and analyze it are presented for public use on open-access terms.

Scripts are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/. Data files are licensed for reuse subject to the Creative Commons Attribution License 4.0 (CC-BY-4.0), https://creativecommons.org/licenses/by/4.0/, and also subject to additional conditions imposed by the Canadian Intellectual Property Office (CIPO) as described below.

Terms of Use:

As per the terms of use of CIPO's government data, all users are required to include the above-quoted attribution to CIPO in any reproductions of this dataset. They are further required to cease using any record within the datasets that has been modified by CIPO and for which CIPO has issued a notice on its website in accordance with its Terms and Conditions, and to use the datasets in compliance with applicable laws. These requirements are in addition to the terms of the CC-BY-4.0 license, which require attribution to the author (among other terms). For further information on CIPO’s terms and conditions, see https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html. For further information on the CC-BY-4.0 license, see https://creativecommons.org/licenses/by/4.0/.

The following attribution statement, if included by users of this dataset, is satisfactory to the author, but the author makes no representations as to whether it may be satisfactory to CIPO:

The Canada Trademarks Dataset is (c) 2021 by Jeremy Sheff and licensed under a CC-BY-4.0 license, subject to additional terms imposed by the Canadian Intellectual Property Office. It contains data licensed by Her Majesty the Queen in right of Canada, as represented by the Minister of Industry, the minister responsible for the administration of the Canadian Intellectual Property Office. For further information, see https://creativecommons.org/licenses/by/4.0/ and https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/wr01935.html.

Details of Repository Contents:

This repository includes a number of .zip archives which expand into folders containing either scripts for construction and analysis of the dataset or data files comprising the dataset itself. These folders are as follows:

/csv: contains the .csv versions of the data files

/do: contains Stata do-files used to convert the .csv files to .dta format and perform the statistical analyses set forth in the paper reporting this dataset

/dta: contains the .dta versions of the data files

/py: contains the python scripts used to download CIPO’s historical trademarks data via SFTP and generate the .csv data files

If users wish to construct rather than download the datafiles, the first script that they should run is /py/sftp_secure.py. This script will prompt the user to enter their IP Horizons SFTP credentials; these can be obtained by registering with CIPO at https://ised-isde.survey-sondage.ca/f/s.aspx?s=59f3b3a4-2fb5-49a4-b064-645a5e3a752d&lang=EN&ds=SFTP. The script will also prompt the user to identify a target directory for the data downloads. Because the data archives are quite large, users are advised to create a target directory in advance and ensure they have at least 70GB of available storage on the media in which the directory is located.

The sftp_secure.py script will generate a new subfolder in the user’s target directory called /XML_raw. Users should note the full path of this directory, which they will be prompted to provide when running the remaining python scripts. Each of the remaining scripts, the filenames of which begin with “iterparse”, corresponds to one of the data files in the dataset, as indicated in the script’s filename. After running one of these scripts, the user’s target directory should include a /csv subdirectory containing the data file corresponding to the script; after running all the iterparse scripts the user’s /csv directory should be identical to the /csv directory in this repository. Users are invited to modify these scripts as they see fit, subject to the terms of the licenses set forth above.

With respect to the Stata do-files, only one of them is relevant to construction of the dataset itself. This is /do/CA_TM_csv_cleanup.do, which converts the .csv versions of the data files to .dta format, and uses Stata’s labeling functionality to reduce the size of the resulting files while preserving information. The other do-files generate the analyses and graphics presented in the paper describing the dataset (Jeremy N. Sheff, The Canada Trademarks Dataset, 18 J. Empirical Leg. Studies (forthcoming 2021)), available at https://papers.ssrn.com/abstract=3782655). These do-files are also licensed for reuse subject to the terms of the CC-BY-4.0 license, and users are invited to adapt the scripts to their needs.

The python and Stata scripts included in this repository are separately maintained and updated on Github at https://github.com/jnsheff/CanadaTM.

This repository also includes a copy of the current version of CIPO's data dictionary for its historical XML trademarks archive as of the date of construction of this dataset.
s
Data from: Data files used to study change dynamics in software systems
figshare.swinburne.edu.au
pdf
Updated Jul 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25916/sut.26288227.v1
Dataset updated
Jul 22, 2024
Dataset provided by
Swinburne
Authors
Rajesh Vasa
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Effects of community management on user activity in online communities
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alberto Cottica; Alberto Cottica (2025). Effects of community management on user activity in online communities [Dataset]. http://doi.org/10.5281/zenodo.1320261
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1320261
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alberto Cottica; Alberto Cottica
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and code needed to reproduce the results of the paper "Effects of community management on user activity in online communities", available in draft here.

Instructions:

Unzip the files.

Start with JSON files obtained from calling platform APIs: each dataset consists of one file for posts, one for comments, one for users. In the paper we use two datasets, one referring Edgeryders, the other to Matera 2019.

Run them through edgesense (https://github.com/edgeryders/edgesense). Edgesense allows to set the length of the observation period. We set it to 1 week and 1 day for Edgeryders data, and to 1 day for Matera 2019 data. Edgesense stores its results in a file called JSON network.min.json, which we then rename to keep track of the data source and observation length.

Launch Jupyter Notebook and run the notebook provided to convert the network.min.json files into CSV flat files, one for each netwrk file

Launch Stata and open each flat csv files with it, then save it in Stata format.

Use the provided Stata .do scripts to replicate results.

Please note: I use both Stata and Jupyter Notebook interactively, running a block with a few lines of code at a time. Expect to have to change directories, file names etc.
carjgil/teacher-bias: Teacher Bias - Replication Package
zenodo.org
zip
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
carjgil; carjgil (2024). carjgil/teacher-bias: Teacher Bias - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.12666535
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12666535
Dataset updated
Jul 5, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
carjgil; carjgil
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
*Project: Teacher's Bias in Assessments

*Code: Replication of data cleaning and preparation and analyses

*Authors: Carlos J. Gil-Hernández, Irene Pañeda-Fernández, Leire Salazar, and Jonatan Castaño-Muñoz

*Last Update: 04/07/2024

*Software: STATA/MP 17

Here you can find the replication dofile in STATA format in the "code" folder and the raw and working datasets (including the codebook) of the teacher's bias in assessment experiment in the "data" folder:

1. "/replication files/code/datacleaning.do" contains all the data cleaning and preparation procedures from the raw anonymized Qualtrics data where we applied the survey experiment (see "data" folder .dta or .csv files named "raw_dataset_anonymized") to set a working dataset ready to be analyzed.

2. The folder "/replication files/data" contains the data files named "raw_dataset_anonymized" and "cleandataset" in .dta (data/STATA) or .csv (data/CSV) format on the raw and working data, respectively, to replicate the findings of the teacher's bias in assessments project or run your own analyses. If you do not have access to STATA software, you can check the variables labels of the "cleandataset" in the "data/codebook_cleandataset" Excel file.

Data Citation: Gil-Hernández, Carlos J., Leire Salazar, Jonatan Castaño Muñoz, and Irene Pañeda-Fernandez. 2023. "Teacher's Bias Dataset: A Factorial Survey Experiment." European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/f14f5209-f032-4218-a89a-4643143809af

3. "datanalysis.do" reproduces all the tables and figures presented in the article and online appendix (if you want to reproduce the analyses from the pre-test pilot data, please get in contact with the corresponding author) using the data file named "cleandataset" in the "data" folder (in .dta or .csv format). The output from "datanalysis.do" will be printed in the "/replication files/output" subfolders for tables (main or appendix) or figures (main or appendix).
d
Replication Data for: Lawyers' Role-Induced Bias Arises Fast and Persists...
dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spamann, Holger (2023). Replication Data for: Lawyers' Role-Induced Bias Arises Fast and Persists Despite Intervention [Dataset]. http://doi.org/10.7910/DVN/CRZCPT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/CRZCPT
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Spamann, Holger
Description
This data depository contains all experimental materials, data, and code for Spamann, Lawyers' Role-Induced Bias ... All experimental materials (i.e., exercise and survey instrument) are in the pdf file Spamann_experimentalmaterials_all.pdf. The dataset Newman.dta (Stata 14.2) contains the data collected. The Stata do-file Spamann_role_bias_code.do generates the three figures and other reported statistical information reported in the version of the paper originally posted to SSRN in May 2019. Spamann_role_bias_code_revised.do generates the four figures and other reported statistical information reported in the revision submitted to JLS in March 2020 and ultimately accepted by the journal. Both do-files use Newman.dta. Newman.dta is the result of merging 6 csv files generated by Qualtrics in each of the six semesters from students' survey responses. These 6 csv files, and the do-file rawdata_merge_clean.do to merge them, are also included.
H
Replication Data for: Political Scientists: A Profile of Congressional...
dataverse.harvard.edu
Updated Jul 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Motta (2020). Replication Data for: Political Scientists: A Profile of Congressional Candidates with STEM Backgrounds [Dataset]. http://doi.org/10.7910/DVN/2ASZ9B
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/2ASZ9B
Dataset updated
Jul 27, 2020
Dataset provided by
Harvard Dataverse
Authors
Matthew Motta
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
README: This page contains the publicly available "Political Scientists" dataset (psdata.csv) proposed and described in "Political Scientists: A Profile of Congressional Candidates with STEM Backgrounds." It also includes a codebook, and syntax file necessary to replicate all main text analyses. Note that the syntax file replicates analyses only, as all data in the psdata .csv file is pre-coded and cleaned. Please refer to the main text for additional information about what is included in this file. Note also that I include a .dta version of the .csv file, suitable to be read into Stata.
f
Dataset for "The impact of sugar taxes on agricultural trade and...
figshare.com
zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sung Ju Cho (2024). Dataset for "The impact of sugar taxes on agricultural trade and health-related outcomes: a structural gravity model analysis" [Dataset]. http://doi.org/10.6084/m9.figshare.27221571.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27221571.v1
Dataset updated
Oct 14, 2024
Dataset provided by
figshare
Authors
Sung Ju Cho
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains datasets used in the study "The impact of sugar taxes on agricultural trade and health-related outcomes: a structural gravity model analysis." It includes both a Stata .dta file and CSV files to facilitate result replication.Key Variables (full details in data_ssbtax_desc.csv):k: Industry IDi: Exporter IDj: Importer IDt: YearX: Trade flowsNDI: Non-discriminatory tax policyINTL: International border dummySSBTAX: INTL x NDIGATTWTO: Both in GATT/WTOPTA: Both in PTASSBTAXadv: SSBTAX x (ad valorem = 1)SSBTAXspe: SSBTAX x (specific = 1)SSBTAXhlth0: SSBTAX x (healthobj = 0)SSBTAXhlth1: SSBTAX x (healthobj = 1)
d
Stata Program - Claims-Based Frailty Index
search.dataone.org
dataverse.harvard.edu
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bedell, Douglas (2024). Stata Program - Claims-Based Frailty Index [Dataset]. http://doi.org/10.7910/DVN/WFDPNH
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/WFDPNH
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
Bedell, Douglas
Description
This STATA program calculates CFI for each patient from analytic data files containing information on patient identifiers, ICD-9-CM diagnosis codes (version 32), ICD-10-CM Diagnosis Codes (version 2020), CPT codes, and HCPCS codes. NOTE: When downloading, store "CFI_ICD9CM_V32.tab" and "CFI_ICD10CM_V2020.tab" as csv files (these files are originally stored as csv files, but Dataverse automatically converts them to tab files). Please read "Frailty-Index-STATA-code-Guide" before proceeding. Interpretation, validation data, and annotated references are provided in "Research Background - Claims-Based Frailty Index".
Integrated Postsecondary Education Data System, Complete 1980-2023
datalumos.org
Updated Feb 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Education (2025). Integrated Postsecondary Education Data System, Complete 1980-2023 [Dataset]. http://doi.org/10.3886/E218981V1
Explore at:
Unique identifier
https://doi.org/10.3886/E218981V1
Dataset updated
Feb 11, 2025
Dataset authored and provided by
United States Department of Educationhttp://ed.gov/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1980 - 2023
Description
Integrated Postsecondary Education Data System (IPEDS) Complete Data Files from 1980 to 2023. Includes data file, STATA data file, SPSS program, SAS program, STATA program, and dictionary. All years compressed into one .zip file due to storage limitations.From IPEDS Complete Data File Help Page (https://nces.ed.gov/Ipeds/help/complete-data-files):Choose the file to download by reading the description in the available titles. Then, click on the link in that row corresponding to the column header of the type of file/information desired to download.To download and view the survey files in basic CSV format use the main download link in the Data File column.For files compatible with the Stata statistical software package, use the alternate download link in the Stata Data File column.To download files with the SPSS, SAS, or STATA (.do) file extension for use with statistical software packages, use the download link in the Programs column.To download the data Dictionary for the selected file, click on the corresponding link in the far right column of the screen. The data dictionary serves as a reference for using and interpreting the data within a particular survey file. This includes the names, definitions, and formatting conventions for each table, field, and data element within the file, important business rules, and information on any relationships to other IPEDS data.For statistical read programs to work properly, both the data file and the corresponding read program file must be downloaded to the same subdirectory on the computer’s hard drive. Download the data file first; then click on the corresponding link in the Programs column to download the desired read program file to the same subdirectory.When viewing downloaded survey files, categorical variables are identified using codes instead of labels. Labels for these variables are available in both the data read program files and data dictionary for each file; however, for files that automatically incorporate this information you will need to select the Custom Data Files option.
Annual Survey of State Government Finances 1992-2018
search.datacite.org
openicpsr.org
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2021). Annual Survey of State Government Finances 1992-2018 [Dataset]. http://doi.org/10.3886/e101880
Explore at:
Unique identifier
https://doi.org/10.3886/e101880
Dataset updated
2021
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/
Authors
Jacob Kaplan
Description
Version 4 release notes:Changes release notes description, does not change data.Version 3 release notesAdds 2018 data.Renames some columns so all column names are <= 32 characters to fix Stata limit.
Version 2 release notesAdds 2017 data. R and Stata files now available.

The .csv file includes data from the years 1992-2016. No data was changed. Only column names were changed to standardize it across years. Some columns (e.g. Population) that are not in all years are removed. Amounts are in thousands of dollars.
The zip file includes all raw (completely untouched) files for years 1992-2016.

From the Census, "The Annual Survey of State Government Finances provides a comprehensive summary of the annual survey findings for state governments, as well as data for individual states. The tables contain detail of revenue by source, expenditure by object and function, indebtedness by term, and assets by purpose." (link to this quote is below)

Information from the U.S. Census about the data is here. https://www.census.gov/programs-surveys/state/about.html
d
Replication Data for: College Athlete MRP Submission
search.dataone.org
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Losak, Jeremy (2023). Replication Data for: College Athlete MRP Submission [Dataset]. http://doi.org/10.7910/DVN/QEU6BV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/QEU6BV
Dataset updated
Nov 12, 2023
Dataset provided by
Harvard Dataverse
Authors
Losak, Jeremy
Description
The data and STATA code files included are part of our team's study of the marginal revenue product of elite college football players. Included are: a CSV file containing all of the original data, a .do file with the regressions/models included in this paper, and a .dta file containing recruiting data used later in the paper. Data covers the 2006-2015 seasons.
d
PICE - Parental Investment in Children's Education
doi.org
swissubase.ch
Updated Feb 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). PICE - Parental Investment in Children's Education [Dataset]. http://doi.org/10.48573/hjc1-3171
Explore at:
Unique identifier
https://doi.org/10.48573/hjc1-3171
Dataset updated
Feb 27, 2024
Description
The file "Material Overview" provides an overview of the data and documentation for PICE; including information in which languages the respective documentation is available.

The Zip with the data contains the interviews of youngsters and parents. Youngsters have been interviewed once and their parents twice. This Zip also contains a Stata- and a csv-file "Closed questions PICE Parents".

A detailed description of the dataset can be found in the Technical Report.
2019 CEV Data: Current Population Survey Civic Engagement and Volunteering...
catalog.data.gov
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AmeriCorps Office of Research and Evaluation (2025). 2019 CEV Data: Current Population Survey Civic Engagement and Volunteering Supplement [Dataset]. https://catalog.data.gov/dataset/2019-cev-data-current-population-survey-civic-engagement-and-volunteering-supplement
Explore at:
Dataset updated
Jan 23, 2025
Dataset provided by
AmeriCorpshttp://www.americorps.gov/
Description
The Current Population Survey Civic Engagement and Volunteering (CEV) Supplement is the most robust longitudinal survey about volunteerism and other forms of civic engagement in the United States. Produced by AmeriCorps in partnership with the U.S. Census Bureau, the CEV takes the pulse of our nation’s civic health every two years. The data on this page was collected in September 2019. The CEV can generate reliable estimates at the national level, within states and the District of Columbia, and in the largest twelve Metropolitan Statistical Areas to support evidence-based decision making and efforts to understand how people make a difference in communities across the country. This page was updated on January 16, 2025 to ensure consistency across all waves of CEV data. Click on "Export" to download and review an excerpt from the 2019 CEV Analytic Codebook that shows the variables available in the analytic CEV datasets produced by AmeriCorps. Click on "Show More" to download and review the following 2019 CEV data and resources provided as attachments: 1) CEV FAQs – answers to frequently asked technical questions about the CEV 2) Constructs and measures in the CEV 3) 2019 CEV Analytic Data and Setup Files – analytic dataset in Stata (.dta), R (.rdata), SPSS (.sav), and Excel (.csv) formats, codebook for analytic dataset, and Stata code (.do) to convert raw dataset to analytic formatting produced by AmeriCorps. 4) 2019 CEV Technical Documentation – codebook for raw dataset and full supplement documentation produced by U.S. Census Bureau 5) 2019 CEV Raw Data and Read In Files – raw dataset in Stata (.dta) format, Stata code (.do) and dictionary file (.dct) to read ASCII dataset (.dat) into Stata using layout files (.lis)
Data from: National Survey of Children's Health
kaggle.com
Updated Feb 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
James Bailey (2025). National Survey of Children's Health [Dataset]. http://doi.org/10.34740/kaggle/dsv/10777194
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10777194
Dataset updated
Feb 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
James Bailey
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Combined 2016-2023 National Survey of Children's Health public use files in CSV and Stata formats. This is a survey collected by government agencies, but they only offer data files for one year at a time and in proprietary formats; so I offer files that combine all years, one of which is in an open format (CSV).
i
Performance Comparison Results of Blockchain tests
ieee-dataport.org
Updated Jan 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hemang Subramanian (2020). Performance Comparison Results of Blockchain tests [Dataset]. https://ieee-dataport.org/documents/performance-comparison-results-blockchain-tests
Explore at:
Dataset updated
Jan 15, 2020
Authors
Hemang Subramanian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
and the .do files are the Stata analysis files which analyses this dataset and outputs the .jpg images.
H
Area Resource File (ARF)
dataverse.harvard.edu
Updated May 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthony Damico (2013). Area Resource File (ARF) [Dataset]. http://doi.org/10.7910/DVN/8NMSFV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/8NMSFV
Dataset updated
May 30, 2013
Dataset provided by
Harvard Dataverse
Authors
Anthony Damico
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
analyze the area resource file (arf) with r the arf is fun to say out loud. it's also a single county-level data table with about 6,000 variables, produced by the united states health services and resources administration (hrsa). the file contains health information and statistics for over 3,000 us counties. like many government agencies, hrsa provides only a sas importation script and an as cii file. this new github repository contains two scripts: 2011-2012 arf - download.R download the zipped area resource file directly onto your local computer load the entire table into a temporary sql database save the condensed file as an R data file (.rda), comma-separated value file (.csv), and/or stata-readable file (.dta). 2011-2012 arf - analysis examples.R limit the arf to the variables necessary for your analysis sum up a few county-level statistics merge the arf onto other data sets, using both fips and ssa county codes create a sweet county-level map click here to view these two scripts for mo re detail about the area resource file (arf), visit: the arf home page the hrsa data warehouse notes: the arf may not be a survey data set itself, but it's particularly useful to merge onto other survey data. confidential to sas, spss, stata, and sudaan users: time to put down the abacus. time to transition to r. :D
m
Data for Nudging to reduce meat consumption: Immediate and persistent...
data.mendeley.com
Updated Nov 8, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verena Kurz (2017). Data for Nudging to reduce meat consumption: Immediate and persistent effects of an intervention at a university restaurant [Dataset]. http://doi.org/10.17632/ctp8f6vfp9.1
Explore at:
Unique identifier
https://doi.org/10.17632/ctp8f6vfp9.1
Dataset updated
Nov 8, 2017
Authors
Verena Kurz
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Coverage: The dataset conatins data collected as part of a field experiment on food consumption at two University restaurants, convering the period September 1, 2015 - June 3, 2016, in Sweden Format: The file analysisdata.dta ist the datafile used for the main analysis with Stata. The file analysisdata.csv contains the same information and can be used in other programmes. The file analysisdata_long.csv is contains the data in the long form for the multinomial and nested logit models and that were run with the software NLOGIT version 5. The file dofile_nudging contains the code for the main analysis and was used with stata 14. The file nudging_cond_logit_nested_logit.lim contains the code for the multinomial and nested logit models that were run with NLOGIT version 5. Language of the data: English and Swedish. Variable labels are provided in English where needed.
d
UNI-CEN Standardized Census Data Table - Province/Territory (PR) - 1991 -...
search.dataone.org
Updated Dec 28, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UNI-CEN Project (2023). UNI-CEN Standardized Census Data Table - Province/Territory (PR) - 1991 - Long Format (DTA) (Version 2023-03) [Dataset]. http://doi.org/10.5683/SP3/5BHI2K
Explore at:
Unique identifier
https://doi.org/10.5683/SP3/5BHI2K
Dataset updated
Dec 28, 2023
Dataset provided by
Borealis
Authors
UNI-CEN Project
Time period covered
Jan 1, 1991
Description
UNI-CEN Standardized Census Data Tables contain Census data that have been reformatted into a common table format with standardized variable names and codes. The data are provided in two tabular formats for different use cases. "Long" tables are suitable for use in statistical environments, while "wide" tables are commonly used in GIS environments. The long tables are provided in Stata Binary (dta) format, which is readable by all statistics software. The wide tables are provided in comma-separated values (csv) and dBase 3 (dbf) formats with codebooks. The wide tables are easily joined to the UNI-CEN Digital Boundary Files. For the csv files, a .csvt file is provided to ensure that column data formats are correctly formatted when importing into QGIS. A schema.ini file does the same when importing into ArcGIS environments. As the DBF file format supports a maximum of 250 columns, tables with a larger number of variables are divided into multiple DBF files. For more information about file sources, the methods used to create them, and how to use them, consult the documentation at https://borealisdata.ca/dataverse/unicen_docs. For more information about the project, visit https://observatory.uwo.ca/unicen.
UAD Appraisal-Level Public Use File
catalog.data.gov
gimi9.com
Updated Feb 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Federal Housing Finance Agency (2025). UAD Appraisal-Level Public Use File [Dataset]. https://catalog.data.gov/dataset/uad-appraisal-level-public-use-file
Explore at:
Dataset updated
Feb 10, 2025
Dataset provided by
Federal Housing Finance Agencyhttps://www.fhfa.gov/
Description
The Uniform Appraisal Dataset (UAD) Appraisal-Level Public Use File (PUF) is the nation’s first publicly available appraisal-level dataset of appraisal records, giving the public new access to a selected set of data fields found in appraisal reports. The UAD Appraisal-Level PUF is based on a five percent nationally representative random sample of appraisals for single-family mortgages acquired by the Enterprises. The current release includes appraisals from 2013 through 2021. The UAD Appraisal-Level PUF is a resource for users capable of using statistical software to extract and analyze data. Users can download annual or combined files in CSV, R, SAS and Stata formats. All files are zipped for ease with download.

Facebook

Twitter

Click to copy link

Link copied

Cite

GESIS - Data Archive for the Social Sciences (2019). Downloadstatistik GESIS Datenarchiv [Dataset]. http://doi.org/10.4232/1.13222

Downloadstatistik GESIS Datenarchiv

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

application/x-spss-sav(2154811), application/x-stata-dta(5384365), (2139418), application/x-spss-sav(2295631), (2051697)Available download formats

Unique identifier

https://doi.org/10.4232/1.13222

Dataset updated

Feb 14, 2019

Dataset provided by

GESIS Data Archive
GESIS search

Authors

GESIS - Data Archive for the Social Sciences

License

https://www.gesis.org/en/institute/data-usage-termshttps://www.gesis.org/en/institute/data-usage-terms

Time period covered

Jan 1, 2004 - Dec 31, 2018

Variables measured

za_nr - Archive study number, doi - Digital Object Identifier, version - GESIS Archive Version, Access - Access category (0, A, B, C, D, E), Title - English study title (if n.a., German title), Title_DE - German study title (if n.a., English title), Total - All downloads combined (all years, all sources), d_2004_dbk - All DBK downloads from that respective year, d_2005_dbk - All DBK downloads from that respective year, d_2006_dbk - All DBK downloads from that respective year, and 63 more

Description

General information: The data sets contain information on how often materials of studies available through GESIS: Data Archive for the Social Sciences were downloaded and/or ordered through one of the archive´s plattforms/services between 2004 and 2018.

Sources and plattforms: Study materials are accessible through various GESIS plattforms and services: Data Catalogue (DBK), histat, datorium, data service (and others).

Years available: - Data Catalogue: 2012-2018 - data service: 2006-2018 - datorium: 2014-2018 - histat: 2004-2018

Data sets: Data set ZA6899_Datasets_only_all_sources contains information on how often data files such as those with dta- (Stata) or sav- (SPSS) extension have been downloaded. Identification of data files is handled semi-automatically (depending on the plattform/serice). Multiple downloads of one file by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Data set ZA6899_Doc_and_Data_all_sources contains information on how often study materials have been downloaded. Multiple downloads of any file of the same study by the same user (identified through IP-address or username for registered users) on the same days are only counted as one download.

Both data sets are available in three formats: csv (quoted, semicolon-separated), dta (Stata v13, labeled) and sav (SPSS, labeled). All formats contain identical information.

Variables: Variables/columns in both data sets are identical. za_nr ´Archive study number´ version ´GESIS Archiv Version´ doi ´Digital Object Identifier´ StudyNo ´Study number of respective study´ Title ´English study title´ Title_DE ´German study title´ Access ´Access category (0, A, B, C, D, E)´ PubYear ´Publication year of last version of the study´ inZACAT ´Study is currently also available via ZACAT´ inHISTAT ´Study is currently also available via HISTAT´ inDownloads ´There are currently data files available for download for this study in DBK or datorium´ Total ´All downloads combined´ downloads_2004 ´downloads/orders from all sources combined in 2004´ [up to ...] downloads_2018 ´downloads/orders from all sources combined in 2018´ d_2004_dbk ´downloads from source dbk in 2004´ [up to ...] d_2018_dbk ´downloads from source dbk in 2018´ d_2004_histat ´downloads from source histat in 2004´ [up to ...] d_2018_histat ´downloads from source histat in 2018´ d_2004_dataservice ´downloads/orders from source dataservice in 2004´ [up to ...] d_2018_dataservice ´downloads/orders from source dataservice in 2018´

More information is available within the codebook.

Clear search

Close search

Google apps

Main menu

Downloadstatistik GESIS Datenarchiv

The Canada Trademarks Dataset

Data from: Data files used to study change dynamics in software systems

Effects of community management on user activity in online communities

carjgil/teacher-bias: Teacher Bias - Replication Package

Replication Data for: Lawyers' Role-Induced Bias Arises Fast and Persists...

Replication Data for: Political Scientists: A Profile of Congressional...

Dataset for "The impact of sugar taxes on agricultural trade and...

Stata Program - Claims-Based Frailty Index

Integrated Postsecondary Education Data System, Complete 1980-2023

Annual Survey of State Government Finances 1992-2018

Replication Data for: College Athlete MRP Submission

PICE - Parental Investment in Children's Education

2019 CEV Data: Current Population Survey Civic Engagement and Volunteering...

Data from: National Survey of Children's Health

Performance Comparison Results of Blockchain tests

Area Resource File (ARF)

Data for Nudging to reduce meat consumption: Immediate and persistent...

UNI-CEN Standardized Census Data Table - Province/Territory (PR) - 1991 -...

UAD Appraisal-Level Public Use File

Downloadstatistik GESIS DatenarchivSee More Versions

Downloadstatistik GESIS Datenarchiv