Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.
Collective intelligence constitutes a foundational element within online community question-and-answering (CQA) platforms, such as Stack Overflow, being the source of most programming-related issues. Despite this relevance, concerns remain regarding issues surrounding user participation. Precedent research tends to focus on simple numerical measurements to analyse participation, which may sideline the inherent, subtler aspects. The proposed study aims to bridge this gap by operationalising 11 distinct metrics to represent user participation, behaviour, and community value across different regions of the USA. The study also conducts inductive content analysis to understand the impact of regional contextual factors on users' knowledge sharing patterns. This replication package is provided for those interested in further examining our research methodology.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
One of the many challenges that social science researchers and practitioners face is the difficulty of relating United States Postal Service (USPS) ZIP codes to Census Bureau geographies. There are valuable data available only at the ZIP code level that, when combined with demographic data tabulated at various Census geography levels, could open up new avenues of exploration.While some acceptable methods of combining ZIP codes and Census geography exist, they have limitations. To provide additional avenues for merging these data, PD&R has released the HUD-USPS Crosswalk Files. These unique files are derived from data in the quarterly USPS Vacancy Data. They originate directly from the USPS; are updated quarterly, making them highly responsive to changes in ZIP code configurations; and reflect the locations of both business and residential addresses. The latter feature is of particular interest to housing researchers because many of the phenomena that they study are based on housing unit or address. By using an allocation method based on residential addresses rather than by area or by population, analysts can take into account not only the spatial distribution of population, but also the spatial distribution of residences. This enables a slightly more nuanced approach to allocating data between disparate geographies. Please note that the USPS Vacancy Data is constructed from ZIP+4 data that contains records of addresses, it does not contain ZIP+4 data that are associated with ZIP codes that exclusively serve Postal Office Boxes (PO Boxes). As a result, ZIP codes that only serve PO Boxes will not appear in the files.In addition to the crosswalk files, this dataset also includes screenshots of HUDs documentation and FAQ pages.Understanding ZIP Code Crosswalk FilesThough often used for mapping, spatial analysis, and data aggregation careful attention is required when interpreting ZIP Code data relative to other administrative geographies. The following article demonstrates how to more effectively use the U.S. Department of Housing and Urban Development (HUD) United States Postal Service ZIP Code Crosswalk Files when working with disparate geographies.Wilson, Ron and Din, Alexander, 2018. “Understanding and Enhancing the U.S. Department of Housing and Urban Development’s ZIP Code Crosswalk Files,” Cityscape: A Journal of Policy Development and Research, Volume 20 Number 2, 277 – 294. https://www.huduser.gov/portal/periodicals/cityscpe/vol20num2/ch16.pdfUsing a GIS to Geoprocess ZIP Code Crosswalk FilesThis article demonstrates how to use a GIS to process ZIP Code Crosswalk Files. In this article, calls for service from New York City's Open Data Portal are estimated at the county-level and census tract-level. This article also includes an accuracy analysis.Din, Alexander and Wilson, Ron, 2020. "Crosswalking ZIP Codes to Census Geographies: Geoprocessing the U.S. Department of Housing & Urban Development’s ZIP Code Crosswalk Files," Cityscape: A Journal of Policy Development and Research, Volume 22, Number 1, https://www.huduser.gov/portal/periodicals/cityscpe/vol22num1/ch12.pdf
Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work.
U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area.
This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.
The Utility Energy Registry (UER) is a database platform that provides streamlined public access to aggregated community-scale utility-reported energy data. The UER is intended to promote and facilitate community-based energy planning and energy use awareness and engagement. On April 19, 2018, the New York State Public Service Commission (PSC) issued the Order Adopting the Utility Energy Registry under regulatory CASE 17-M-0315. The order requires utilities under its regulation to develop and report community energy use data to the UER. This dataset includes electricity and natural gas usage data reported at the ZIP Code level collected under a data protocol in effect between 2016 and 2021. Other UER datasets include energy use data reported at the city, town, village, and county level. Data collected after 2021 were collected according to a modified protocol. Those data may be found at https://data.ny.gov/Energy-Environment/Utility-Energy-Registry-Monthly-ZIP-Code-Energy-Us/g2x3-izm4. Data in the UER can be used for several important purposes such as planning community energy programs, developing community greenhouse gas emissions inventories, and relating how certain energy projects and policies may affect a particular community. It is important to note that the data are subject to privacy screening and fields that fail the privacy screen are withheld. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and accelerate economic growth. reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.
The Utility Energy Registry (UER) is a database platform that provides streamlined public access to aggregated community-scale energy data. The UER is intended to promote and facilitate community-based energy planning and energy use awareness and engagement. On April 19, 2018, the New York State Public Service Commission (PSC) issued the Order Adopting the Utility Energy Registry under regulatory CASE 17-M-0315. The order requires utilities and CCA administrators under its regulation to develop and report community energy use data to the UER. This dataset includes electricity and natural gas usage data reported at the ZIP Code level. Other UER datasets include energy use data reported at the city, town, village, and county level. Data in the UER can be used for several important purposes such as planning community energy programs, developing community greenhouse gas emissions inventories, and relating how certain energy projects and policies may affect a particular community. It is important to note that the data are subject to privacy screening and fields that fail the privacy screen are withheld. The New York State Energy Research and Development Authority (NYSERDA) offers objective information and analysis, innovative programs, technical expertise, and support to help New Yorkers increase energy efficiency, save money, use renewable energy, and reduce reliance on fossil fuels. To learn more about NYSERDA’s programs, visit nyserda.ny.gov or follow us on X, Facebook, YouTube, or Instagram.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Artifact package for the study "When GUI-based Testing Meets Code Reviews".
01_Interview_transcriptions.zip: All transcribed and anonymized interviews
02_QDA.zip: All data related to the qualitative data analysis (QDA) of interviews. This includes the codebook (all versions of it), the tool with an export of the data, and the final coding by two coders and its diff.
03_GitHub_analysis.zip: Source code and cached results for gathering code review comments of three GitHub repositories.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The content of the replication package is the following:Issue History of 100 Projects.zip:For each project, there is a JSON file that contains information about the history of each issue.repos.zip:For each project, there is a .git folder that contains the repository content.code.zip:Code for mining, analyzing, and reporting for our research can be found in this zipped folder.Reports.zip:Contains the figures that are generated by our code and used in the paper.Research Survey - Issue Templates on GitHub (Responses) - Shared.pdfContains the questions and answers to our research survey.
Parkland Obesity Research focuses on investigating the multifaceted factors contributing to the prevalence of obesity within the communities of Dallas. This comprehensive study delves into genetic, environmental, and lifestyle components influencing weight gain and obesity-related health issues. By employing cutting-edge methodologies, the research aims to unravel the intricate interplay between genetics and lifestyle choices, shedding light on potential interventions for prevention and management. Through rigorous data analysis and community engagement, Parkland Obesity Research strives to develop tailored strategies addressing the unique challenges faced by the local population. The ultimate goal is to contribute valuable insights that can inform public health policies, promote healthier lifestyles, and mitigate the impact of obesity on individuals and the community at large.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This workflow adapts the approach and parameter settings of Trans-Omics for precision Medicine (TOPMed). The RNA-seq pipeline originated from the Broad Institute. There are in total five steps in the workflow starting from:
For testing and analysis, the workflow author provided example data created by down-sampling the read files of a TOPMed public access data. Chromosome 12 was extracted from the Homo Sapien Assembly 38 reference sequence and provided by the workflow authors. The required GTF and RSEM reference data files are also provided. The workflow is well-documented with a detailed set of instructions of the steps performed to down-sample the data are also provided for transparency. The availability of example input data, use of containerization for underlying software and detailed documentation are important factors in choosing this specific CWL workflow for CWLProv evaluation.
This dataset folder is a CWLProv Research Object that captures the Common Workflow Language execution provenance, see https://w3id.org/cwl/prov/0.5.0 or use https://pypi.org/project/cwl
Steps to reproduce
To build the research object again, use Python 3 on macOS. Built with:
Install cwltool
pip3 install cwltool==1.0.20180912090223
Install git lfs
The data download with the git repository requires the installation of Git lfs:
https://www.atlassian.com/git/tutorials/git-lfs#installing-git-lfs
Get the data and make the analysis environment ready:
git clone https://github.com/FarahZKhan/cwl_workflows.git
cd cwl_workflows/
git checkout CWLProvTesting
./topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/download_examples.sh
Run the following commands to create the CWLProv Research Object:
cwltool --provenance rnaseqwf_0.6.0_linux --tmp-outdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp --tmpdir-prefix=/CWLProv_workflow_testing/intermediate_temp/temp topmed-workflows/TOPMed_RNAseq_pipeline/rnaseq_pipeline_fastq.cwl topmed-workflows/TOPMed_RNAseq_pipeline/input-examples/Dockstore.json
zip -r rnaseqwf_0.5.0_mac.zip rnaseqwf_0.5.0_mac
sha256sum rnaseqwf_0.5.0_mac.zip > rnaseqwf_0.5.0_mac_mac.zip.sha256
The https://github.com/FarahZKhan/cwl_workflows repository is a frozen snapshot from https://github.com/heliumdatacommons/TOPMed_RNAseq_CWL commit 027e8af41b906173aafdb791351fb29efc044120
By US Open Data Portal, data.gov [source]
This dataset provides an inside look at the performance of the Veterans Health Administration (VHA) hospitals on timely and effective care measures. It contains detailed information such as hospital names, addresses, census-designated cities and locations, states, ZIP codes county names, phone numbers and associated conditions. Additionally, each entry includes a score, sample size and any notes or footnotes to give further context. This data is collected through either Quality Improvement Organizations for external peer review programs as well as direct electronic medical records. By understanding these performance scores of VHA hospitals on timely care measures we can gain valuable insights into how VA healthcare services are delivering values throughout the country!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information about the performance of Veterans Health Administration hospitals on timely and effective care measures. In this dataset, you can find the hospital name, address, city, state, ZIP code, county name, phone number associated with each hospital as well as data related to the timely and effective care measure such as conditions being measured and their associated scores.
To use this dataset effectively, we recommend first focusing on identifying an area of interest for analysis. For example: what condition is most impacting wait times for patients? Once that has been identified you can narrow down which fields would best fit your needs - for example if you are studying wait times then “Score” may be more valuable to filter than Footnote. Additionally consider using aggregation functions over certain fields (like average score over time) in order to get a better understanding of overall performance by factor--for instance Location.
Ultimately this dataset provides a snapshot into how Veteran's Health Administration hospitals are performing on timely and effective care measures so any research should focus around that aspect of healthcare delivery
- Analyzing and predicting hospital performance on a regional level to improve the quality of healthcare for veterans across the country.
- Using this dataset to identify trends and develop strategies for hospitals that consistently score low on timely and effective care measures, with the goal of improving patient outcomes.
- Comparison analysis between different VHA hospitals to discover patterns and best practices in providing effective care so they can be shared with other hospitals in the system
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: csv-1.csv | Column name | Description | |:-----------------------|:-------------------------------------------------------------| | Hospital Name | Name of the VHA hospital. (String) | | Address | Street address of the VHA hospital. (String) | | City | City where the VHA hospital is located. (String) | | State | State where the VHA hospital is located. (String) | | ZIP Code | ZIP code of the VHA hospital. (Integer) | | County Name | County where the VHA hospital is located. (String) | | Phone Number | Phone number of the VHA hospital. (String) | | Condition | Condition being measured. (String) | | Measure Name | Measure used to measure the condition. (String) | | Score | Score achieved by the VHA h...
Mapping Layer Data Released: 06/15/2017, | Last Updated 04/20/2024Data Currency: This data is checked semi-annually from it's enterprise federal source fo 2010 CENSUS Data and will support mapping, analysis, data exports and the Open Geospatial Consortium (OGC) Application Programming Interface (API).Data Update Frequency: Twice, YearlyData Cycle | History (as required below)QA/QC Performed: December, 2024Next Scheduled Data QA/QC: July, 2024CDC PLACES (2010 CENSUS) FEATURE LAYERData Requester: Rhode Island Executive Office of Health and Human Service (OHHS) via Health Equity Institute (HEI).Data Requester: Rhode Island Department of Health, Maternal Child Health via Health Equity Institute (HEI).Data Request: Provide a database deliverable via download that contains both US CENSUS tracts and USPS Zip Code Tabulation Areas (ZCTA).HEALTH EQUITY INSTITUTE DATA CONNECT RI Using Modern GIS (Mapping)🡅 Click IT 🡅Facilitate transformative mapping visualizations that engage constituents and measure the impact of real-world solutions.Instructions to Join Your Data Provided Below STEP 1: Video (Pending)STEP 2: Video (Pending)STEP 3: Video (Pending)There are twenty-two U.S. CENSUS fields (download here) that you can join to your datasets. For additional insight, please contact the Center for Health Data and Analysis (CHDA) Rhode Island Department of Health (GIS) Mapping Department for assistance.Database Enhancement: This database contains two (2) additional data fields for consideration to be added to the existing 2020 State of Rhode Island Health Equity Map.Zip Code Tabulation Area (ZCTA)ZCTA/Tract Relationship (Singular ZCTAs per Tract, versus Multiple ZCTAs per Tract)Additional Information: While ZCTAs can be useful for certain qualitative purposes, such as broad or general high level analysis, they may not provide the level of granularity and accuracy required for in-depth demographic research which is required for policy mapping. ZCTAs can change frequently as the US Postal Service (USPS) adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.RIDOH GIS encourages analysts to make the appropriate choice of using census based data, with their consistent boundaries readily available for suitability for spatial analysis when conducting detailed demographic research.Here are a few reasons why you might want to consider using census based data (tracts, block groups, and blocks) instead of ZCTAs:1. Inaccurate Representations: ZCTAs are not designed for statistical analysis or demographic research. They are created by the United States Postal Service (USPS) for efficient mail delivery and can often span multiple cities, counties, or even states. As a result, ZCTAs may not accurately represent the actual geographic boundaries or demographic characteristics of a specific area.2. Lack of Granularity: ZCTAs are typically larger than census tracts, which are smaller, more homogeneous geographic units defined by the U.S. Census Bureau. Census tracts are designed to be relatively consistent in terms of population size, allowing for more detailed analysis at a local level. ZCTAs, on the other hand, can vary significantly in terms of population size, making it challenging to draw precise conclusions about specific neighborhoods or communities.3. Data Availability and Compatibility: Census tracts are used by the U.S. Census Bureau to collect and report demographic data. Consequently, a wide range of demographic information, such as population counts, age distribution, income levels, and education levels, is readily available at the census tract level. In contrast, data specifically tailored to ZCTAs may be more limited, making it difficult to obtain comprehensive and consistent data for demographic analysis.4. Changes Over Time: Census tracts are relatively stable over time, allowing for consistent longitudinal analysis. ZCTAs, however, can change frequently as the USPS adjusts postal routes and boundaries. These changes can lead to inconsistencies and challenges in tracking demographic trends and making accurate comparisons over time.5. Spatial Analysis: Census tracts are designed to maintain a level of spatial proximity, adjacency, or connectedness of these data containers while providing consistency and continuity over time - making them useful for spatial analysis. Mapping. ZCTAs, on the other hand, may not exhibit the same level of spatial coherence due to their primary purpose being mail delivery efficiency rather than geographic representation.State Agencies - Contact RIDOH GIS - Learn More About Mapping Data Available at the Census Tract LevelRIDOH GIS releases this database with the caveats noted above and that the researcher can accurately align the ZCTAs with the corresponding census tracts. Careful consideration should be given to the comparability and compatibility of the data collected at different geographic levels to ensure valid and meaningful statistical conclusions. Data Dictionary: 2010 Decennial CensusOBJECT ID - the count of each census tract entity.GEOID (10) STATE,COUNTY,TRACT - Numeric US CENSUS Tract Description (2010) HEZ (10) - Health Equity Zone (2020)LOCATION (10) - Plain Language Census Tract Descriptor (2010)COUNTY (10) NAME - County Name (2010)STATE (10) NAME - State Name (2010)ZCTA (23) - Zip Code Tabulation Area - Numeric US CENSUS ZCTA Description (2023)ZCTA/TRACT CONTEXT - Number of ZCTAs (Singular/Multiple) that reside within a US CENSUS TractST (10) - Numeric US CENSUS Tract Description (2010) CO (10) - Numeric US CENSUS Tract Description (2010)ST (10) CO (10) - Numeric US CENSUS Tract Description (2010)TRACT (10) - Numeric US CENSUS Tract Description (2010)GEOID (10) - Numeric US CENSUS Tract Description (2010)TRIBAL TRACT (10) - Numeric US CENSUS Tract Description (2010)Additional Mapping DataThe user is provided authoritative Federal Information Processing Standards (FIPS) such as numeric descriptions of state, county and tract identification, in addition to shape and length measurements of each census tract for data joining purposes.STATE (10) - Federal Information Processing Standards (FIPS)COUNTY (10) - Federal Information Processing Standards (FIPS)STATE (10), COUNTY (10) - Federal Information Processing Standards (FIPS)TRACT (10) - Federal Information Processing Standards (FIPS)TRIBAL TRACT (10) - Federal Information Processing Standards (FIPS)ST ABBRV (10) - State AbbreviationShape_Length - Total length of the polygon's (census tract) perimeter, in the units used by the feature class' coordinate system.Shape_Area - Total area of the polygon's (census tract) in the units used by the feature class' coordinate system.Data Source: Series Information for 2020 Census 5-Digit ZIP Code Tabulation Area (ZCTA5) National TIGER/Line Shapefiles, Current Open Geospatial Consortium (OGC) Application Programming Interface (API) Census ZIP Code Tabulation Areas - OGC Features copy this link to embed it in OGC Compliant viewers. For more information, please visit: ZIP Code Tabulation Areas (ZCTAs)To Report Data Discrepancies Contact the Rhode Island Department of Health (RIDOH) GIS (mapping) OfficePlease Be Certain To --Provide a Brief Description of What the Discrepancy IsInclude Your, Name, Organization, Telephone NumberAttach the Complete .xlsx with the Discrepancy Highlighted
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
--Collective intelligence constitutes a foundational element within online community question-and-answering (CQA) platforms, such as Stack Overflow, being the source of most programming-related issues. Despite this relevance, concerns remain regarding issues surrounding user participation. Precedent research tends to focus on simple numerical measurements to analyse participation, which may sideline the inherent, subtler aspects.
The proposed study aims to bridge this gap by operationalising 11 distinct metrics to represent user participation, behaviour, and community value across different regions of the USA. The study also conducts inductive content analysis to understand the impact of regional contextual factors on users' knowledge sharing patterns.
This replication package is provided for those interested in further examining our research methodology.
VITAL SIGNS INDICATOR Home Prices (EC7)
FULL MEASURE NAME Home Prices
LAST UPDATED August 2019
DESCRIPTION Home prices refer to the cost of purchasing one’s own house or condominium. While a significant share of residents may choose to rent, home prices represent a primary driver of housing affordability in a given region, county or city.
DATA SOURCE Zillow Median Sale Price (1997-2018) http://www.zillow.com/research/data/
Bureau of Labor Statistics: Consumer Price Index All Urban Consumers Data Table (1997-2018; specific to each metro area) http://data.bls.gov
CONTACT INFORMATION vitalsigns.info@bayareametro.gov
METHODOLOGY NOTES (across all datasets for this indicator) Median housing price estimates for the region, counties, cities, and zip code come from analysis of individual home sales by Zillow. The median sale price is the price separating the higher half of the sales from the lower half. In other words, 50 percent of home sales are below or above the median value. Zillow defines all homes as single-family residential, condominium, and co-operative homes with a county record. Single-family residences are detached, which means the home is an individual structure with its own lot. Condominiums are units that you own in a multi-unit complex, such as an apartment building. Co-operative homes are slightly different from condominiums where the homeowners own shares in the corporation that owns the building, not the actual units themselves.
For metropolitan area comparison values, the Bay Area metro area’s median home sale price is the population-weighted average of the nine counties’ median home prices. Home sales prices are not reliably available for Houston, because Texas is a non-disclosure state. For more information on non-disclosure states, see: http://www.zillow.com/blog/chronicles-of-data-collection-ii-non-disclosure-states-3783/
Inflation-adjusted data are presented to illustrate how home prices have grown relative to overall price increases; that said, the use of the Consumer Price Index does create some challenges given the fact that housing represents a major chunk of consumer goods bundle used to calculate CPI. This reflects a methodological tradeoff between precision and accuracy and is a common concern when working with any commodity that is a major component of CPI itself.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data for 'An Extensive Empirical Study of Nondeterministic Behavior in Static Analysis Tools' and the source code of the tool NDDetector that is used for performing the experiments in RQ2.
There are two directories, data and tool:
contains the data for the conclusion made in the two research questions, RQ1 and RQ2. (rq1 is Research Question 1s data)
In rq1/ there are:
final_results.csv - Contains 43 distinct results from 4 repositories (SOOT, WALA, FlowDroid, DroidSafe) that fix or report nondeterminism.
summary.pdf - Reports the number of nondeterminism results by tool repository at each stage of the qualitative study.
categorization.pdf - Reports the number of nondeterminism results by root cause categories at each component of analysis codebase in which the nondeterminism takes place
raw_data.zip - Contains the raw commits and issues extracted from 9 repositories (SOOT, DOOP, WALA, FlowDroid, DroidSafe, AmanDroid, TAJS, Code2Flow, PyCG)
key_words_results.zip - Contains the results extracted by each keyword (concurrency, concurrent, consistent, determinism, deterministic, different, flakiness, flaky, parallel, thread) from the raw data.
In rq2/ there are:
ICSE2024_AGGREGATE_DATA.csv - Contains the result distributions of each combination of target program, configuration hash, and tool aswell as the calculated consistency score.
analyze_results.py - Script that makes this data.
node_freqs - Contains the frequency of each node in the nondeterministic results we observed,it also keeps track of whether this particular node is a callee or caller or source/sink.
edge_dists - Contains the actual edge distributions of all of our results that behaved nondeterministically. it contains, for each result (edge/flow) across repetitions, which repetitions did or did not contain this edge/flow and which did. This means if you are interested in the actual differences across results generated by tool edge_dists/ is the place to look.
figure_8 - The raw data and occurences per node sheet for generating Figure_8.
Potential stewardship for Baltimore City summarized by block group. PRIZM 5, 15, and 62 classes are also present. PRIZM is the Potential Rating Index by Zip code Markets produced by the Claritas corportation - (http://www.clusterbigip1.claritas.com/claritas/Default.jsp). Potential stewardship is that land within a parcels not occupied by buildings, that is land that could potentially undergo "greening." Not Realized Potential Stewardship is the land not occupied by buildings or existing vegetation, and is thus the land that is potentially available for "greening" initiatives. This dataset provides several summarizations at the block group level: 1) total potential stewardship area, 2) not realized potential stewardship, 3) normalized potential stewardship (potential stewardship area / block group area), 4) normalized not realized potential stewardship (not realized potential stewardship area / block group area), and 5) tree potential stewardship area.
The potential stewardship was calculated using parcel data, building footprints, and GDT census block groups. Building footprints were erased from the parcel area, resulting in a layer indicating the potential stewardship for each parcel. The potential stewardship layer was then unioned MD DNR's SUFA vegetation layer. All polygons corresponding to water features were deleted since water features cannot undergo "greening." All polygons that did fall in the potential stewardship area were deleted. This resulted in a layer in which the polygons represented the potential stewardship land along with the potential stewardship land occupied by either grass or trees. This layer was then intersected with the census block group layer resulting in a layer that had the potential stewardship land, potential stewardship vegetation, and block group IDs. All attributes were then summarized at the block group level.
Total Potential Stewardship (Tot_PotStew) is the area of parcel land in a block group that is not occupied by buildings or water. Not Realized Potential Stewardship (NotReal_PotStew) is the area of parcel land in a block group that is not occupied by buildings, water, or vegetation, and is thus the land that is actually available for greening. Normalized Potential Stewardship (Norm_PotStew) is the Potential Stewardship area � Block Group area. Normalized Not Realized Potential Stewardship (Norm_NotReal_PotStew) is the Not Realized Potential Stewardship area � Block Group area. Summations of grass (Grass), tree (Trees), and vegetation (Tot_Veg) area within the potential steward zone are also presented.
Tree potential stewardship (Tree_PotStew) was calculated by identifying the land that is available for plating trees, i.e. not building, water, or existing trees. Normalized tree potential stewardship (Norm_Tree_Pot_Stew) was calculated by dividing by the area of the block group. Standardized tree potential stewardship (Std_Tree_PotStew) was calculated from the normalized tree potential stewardship: ((value - min) / range) * 100.
A cursory analysis of the parcel data indicated that parcel data was outdated for the following block groups: 245102503031, 245102503032, and 245102503033.
Note: transportation networks are not part of the parcel data, and thus were appropriately not part of this analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Demographic and socioeconomic characteristics of patients in the health system compared to pancreatic cancer patients.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘NYC Social Media Usage’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nyc-social-media-usage on 28 January 2022.
--- Dataset description provided by original source is as follows ---
The Demographic Reports is a compilation of population, households and housing unit estimates and forecasts; market value estimates; residential development activity estimates; and industrial and commercial gross floor area estimates. Various geographic arrangements are used to present these data, such as supervisor districts, towns, planning districts, human services regions, ZIP Codes, sewer sheds, and census tracts. These small area estimates and forecasts are produced on an annual basis. The methodology used for estimating and forecasting housing units, households and population is contained. The Methodologies used to estimate market value, residential development, and gross floor area are contained in their respective sections. In addition to the small area estimates and forecasts, state and federal data on Fairfax County are collected and summarized, and special studies and Quantitative research are conducted by the unit.
If you use this dataset in your research, please credit John Snow Labs
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data package for "Dataflow Analysis-Inspired Deep Learning for Efficient Vulnerability Detection", published in ICSE 2024, with updates from Artifact Evaluation.Paper link: https://www.computer.org/csdl/proceedings-article/icse/2024/021700a166/1RLIWqviwEMSee Github repo for updates: https://github.com/ISU-PAAL/DeepDFAData dictionary:before.zip: CFGs of Big-Vul dataset, generated by Joern.preprocessed_data.zip: preprocessed data from Big-Vul for running DeepDFA, including preprocessed Joern CFGs and abstract dataflow embeddings.DeepDFA-code.zip: most recent version of the code as of the publication of this artifact, see Github repo for updates: https://github.com/ISU-PAAL/DeepDFAMSR_data_cleaned.csv: original Big-Vul dataset, see original source: https://github.com/ZeoVan/MSR_20_Code_vulnerability_CSV_DatasetMSR_LineVul: LineVul's preprocessed version of the Big-Vul dataset, see original source: https://github.com/awsm-research/LineVulChangelog:v1 2023-09-20: original data package and Github repo published.v2 2024-01-04: added full instructions and bug fixes for Artifact Evaluation.v3 2024-01-10: integrated feedback from Artifact Evaluation.
Potential stewardship for Residential parcels in Baltimore City summarized by block group. Residential was defined as those parcels with a land use code of residential, residential commercial, residential condominium, or apartments based on the 2003 Maryland Property View A&T database. PRIZM 5, 15, and 62 classes are also present. PRIZM is the Potential Rating Index by Zip code Markets produced by the Claritas Corporation - (http://www.clusterbigip1.claritas.com/claritas/Default.jsp). Total Potential stewardship is that land within a parcels not occupied by buildings, that is land that could potentially support vegetation, regardless of whether or not any vegetation is present. Realized Potential Stewardship is land that is currently occupied by vegetation. Not Realized Potential Stewardship is the land not occupied by buildings or existing vegetation, and is thus the land that is potentially available for "greening" initiatives. Normalization for realized and not realized potential stewardship is carried out by dividing by the total potential stewardship. The potential stewardship was calculated using parcel data, building footprints, and GDT census block groups. Building footprints were erased from the parcel area, resulting in a layer indicating the potential stewardship for each parcel. The potential stewardship layer was then unioned MD DNR's 2001 SUFA vegetation layer. All polygons corresponding to water features were deleted since water features cannot undergo "greening." All polygons that did fall in the potential stewardship area were deleted. This resulted in a layer in which the polygons represented the potential stewardship land along with the potential stewardship land occupied by either grass or trees. This layer was then intersected with the census block group layer resulting in a layer that had the potential stewardship land, potential stewardship vegetation, and block group IDs. All attributes were then summarized at the block group level. A cursory analysis of the parcel data indicated that parcel data was outdated for the following block groups: 245102503031, 245102503032, and 245102503033. Certain block groups with very high Normalized Total Potential Stewardship values may be indicative of the fact that building footprint data was missing, although the extent of this problem is unknown. Note: transportation networks are not part of the parcel data, and thus were appropriately not part of this analysis.
Austin's Communications & Technology Management Department is pleased to provide this zip code dataset for general use, designed to support a variety of research and analysis needs. Please note that while we facilitate access to this data, the dataset is owned and produced by the United States Postal Service (USPS). Users are encouraged to acknowledge USPS as the source when utilizing this dataset in their work. U.S. ZIP Code Areas (Five-Digit) represents five-digit ZIP Code areas used by the U.S. Postal Service to deliver mail more effectively. The first digit of a five-digit ZIP Code divides the United States into 10 large groups of states numbered from 0 in the Northeast to 9 in the far West. Within these areas, each state is divided into an average of 10 smaller geographical areas, identified by the second and third digits. These digits, in conjunction with the first digit, represent a sectional center facility or a mail processing facility area. The fourth and fifth digits identify a post office, station, branch or local delivery area. This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. It does not represent an on-the-ground survey and represents only the approximate relative location of property boundaries. This product has been produced by the City of Austin for the sole purpose of geographic reference. No warranty is made by the City of Austin regarding specific accuracy or completeness.