Facebook
TwitterThis dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.
Facebook
TwitterQuestion 1.4.8c: Does the SOE or government publicly disclose the date of the production sales executed by the SOE? , 1.4.6b: Does the SOE publicly disclose its aggregate sales volume?, 1.4.6a: Does the SOE publicly disclose its aggregate production volume?, 1.2.1a: Does the government publicly disclose data on the volume of extractive resource production?, 1.2.1c: Is the data disclosed on the volume of extractive resource production machine-readable?, 1.2.2c: Is the data disclosed on the value of extractive resource exports machine-readable?, 1.2.2b: How up-to-date is the publicly disclosed data on the value of extractive resource exports?
Facebook
TwitterCurrently published data series on the United States household debt service ratio are constructed from aggregate household debt data provided by lenders and estimates of the average interest rate and loan terms of a range of credit products. The approach used to calculate those debt service ratios could be prone to missing changes in loan terms. Better measurement of this important indicator of financial health can help policymakers anticipate and react to crises in household finance. We develop and estimate debt service ratio measures based on individual-level debt payments data obtained from credit bureau data and published estimates of disposable personal income. Our results suggest that aggregate debt service ratios may have understated the payment requirements of households. To the extent possible with two very distinct data sources we examine the details on the composition of household debt service and identify some areas where required payments appear to have varied substantially from the assumptions used in the Board of Governors' aggregate calculation. We then use our technique to calculate both national and state-level debt ratios and break these debt service ratios into debt categories at the national, state level, and metro level. This approach should allow detailed forecasts of debt service ratios based on anticipated changes to interest rates and incomes, which could serve to evaluate the ability of households to cope with potential economic shocks. The ability to disaggregate these estimates into geographic regions or age groups could help to identify the severity of the effects on more exposed groups.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DUIA includes data on the socio-economic development and amenities of 86 cities from a total of 32 countries. DUIA is based on freely and easily available data sources and built on integration protocols and codes in R scripts, making both the construction of the database as a whole and specific statistical analyses fully transparent and replicable. DUIA is constructed in three steps. First, we draw upon remote sensing derived data from the Atlas of Urban Expansion to define city boundaries as accurately and consistently as possible across the different countries. Second, we draw upon survey data stored in IPUMS (Integrated Public Use Microdata Series) to include extensive, harmonized, and disaggregated data. Third, as we especially seek to contribute to comparative research outside the West, we developed tailor-made solutions to include Indian and Chinese cities for which data were not available in IPUMS.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract: The main objective of this study is to estimate variables related to transportation planning, in particular transit trip production, by proposing a geostatistical procedure. The procedure combines the semivariogram deconvolution and Kriging with External Drift (KED). The method consists of initially assuming a disaggregated systematic sample from aggregate data. Subsequently, KED was applied to estimate the primary variable, considering the population as a secondary input. This research assesses two types of information related to the city of Salvador (Bahia, Brazil): an origin-destination dataset based on a home-interview survey carried out in 1995 and the 2010 census data. Besides standing out for the application of Geostatistics in the field of transportation planning, this paper introduces the concepts of semivariogram deconvolution applied to aggregated travel data. Thus far these aspects have not been explored in the research area. In this way, this paper mainly presents three contributions: 1) estimating urban travel data in unsampled spatial locations; 2) obtaining the values of the variable of interest deriving out of other variables; and 3) introducing a simple semivariogram deconvolution procedure, considering that disaggregated data are not available to maintain the confidentiality of individual data.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This replication folder includes all data and computer files needed to reproduce the results from the paper "Do More Disaggregated Electoral Results Deter Aggregation Fraud?"
Facebook
TwitterAbstract of associated article: We explore the effect of cross-sectional aggregation of data on estimation and test of asymmetric retail fuel price responses to wholesale price shocks. The analysis is performed on data collected daily from individual fuel stations in the Spanish metropolitan areas of Madrid and Barcelona. While the standard OLS estimator is applied to an error correction model in the case of the aggregated time series, we use the mean group approaches developed by Pesaran and Smith (1995) and Pesaran (2006) to estimate the short- and long-run micro-relations under heterogeneity. We found remarkable differences between the results of estimations using aggregated and disaggregated data, which are highly robust to both datasets considered. Our findings could help to explain many of the results in the literature on this research topic. On the one hand, they suggest that the typical estimation with aggregated data clearly tends to overestimate the persistence of shocks. On the other hand, we show that aggregation may generate a loss of efficiency in econometric estimates that is sufficiently large to hide the existence of the “rockets and feathers” phenomenon.
Facebook
TwitterThe Registered Apprenticeship data displayed in this resource is derived from several different sources with differing abilities to provide disaggregated data. The 25 federally-administered states and 16 federally-recognized State Apprenticeship Agencies (SAAs) use the Employment and Training Administration's Registered Apprenticeship Partners Information Database System (RAPIDS) to provide individual apprentice and sponsor data. This subset of data is referred to as RAPIDS data and can be disaggregated to provide additional specificity. The federal subset of that data (25 states plus national programs) is known as the Federal Workload. The remaining federally recognized SAAs and the U.S. Military Apprenticeship Program (USMAP) provide limited aggregate data on a quarterly basis that is then combined with RAPIDS data to provide a national data set on high-level metrics (apprentices and programs) but cannot generally be broken out in greater detail beyond the data provided here.
Facebook
TwitterUNIDO maintains a variety of databases comprising statistics of overall industrial growth, detailed data on business structure and statistics on major indicators of industrial performance by country in the historical time series. Among which is the UNIDO Industrial Statistics Database at the 3 & 4-digit levels of ISIC Revision 4 (INDSTAT4-Rev.4).
INDSTAT4 contains highly disaggregated data on the manufacturing sector for the period 2005 onwards. Comparability of data over time and across the countries has been the main priority of developing and updating this database. INDSTAT4 offers a unique possibility of in-depth analysis of the structural transformation of economies over time. The database contains seven principle indicators of industrial statistics. The data are arranged at the 3- and 4-digit levels of the International Standard Industrial Classification of All Economic Activities (ISIC) Revision 4 pertaining to the manufacturing, which comprises more than 160 manufacturing sectors and sub-sectors. The time series can either be used to compare a certain branch or sector of countries or – if present in the data set – some sectors of one country.
For more information, please visit: http://www.unido.org/resources/statistics/statistical-databases.html
Sectors
Aggregate data [agg]
Other [oth]
Facebook
TwitterThe Distributional Financial Accounts (DFAs) provide a quarterly measure of the distribution of U.S. household wealth since 1989, based on a comprehensive integration of disaggregated household-level wealth data with official aggregate wealth measures. The data set contains the level and share of each balance sheet item on the Financial Accounts' household wealth table (Table B.101.h), for various sub-populations in the United States. In our core data set, aggregate household wealth is allocated to each of four percentile groups of wealth: the top 1 percent, the next 9 percent (i.e., 90th to 99th percentile), the next 40 percent (50th to 90th percentile), and the bottom half (below the 50th percentile). Additionally, the data set contains the level and share of aggregate household wealth by income, age, generation, education, and race. The quarterly frequency makes the data useful for studying the business cycle dynamics of wealth concentration--which are typically difficult to observe in lower-frequency data because peaks and troughs often fall between times of measurement. These data will be updated about 10 or 11 weeks after the end of each quarter, making them a timely measure of the distribution of wealth.
Facebook
TwitterElection Data Attribute Field Definitions | Wisconsin Cities, Towns, & Villages Data AttributesWard Data Overview: These municipal wards were created by grouping Census 2010 population collection blocks into municipal wards. This project started with the release of Census 2010 geography and population totals to all 72 Wisconsin counties on March 21, 2011, and were made available via the Legislative Technology Services Bureau (LTSB) GIS website and the WISE-LR web application. The 180 day statutory timeline for local redistricting ended on September 19, 2011. Wisconsin Legislative and Congressional redistricting plans were enacted in 2011 by Wisconsin Act 43 and Act 44. These new districts were created using Census 2010 block geography. Some municipal wards, created before the passing of Act 43 and 44, were required to be split between assembly, senate and congressional district boundaries. 2011 Wisconsin Act 39 allowed communities to divide wards, along census block boundaries, if they were divided by newly enacted boundaries. A number of wards created under Wisconsin Act 39 were named using alpha-numeric labels. An example would be where ward 1 divided by an assembly district would become ward 1A and ward 1B, and in other municipalities the next sequential ward number was used: ward 1 and ward 2. The process of dividing wards under Act 39 ended on April 10, 2012. On April 11, 2012, the United States Eastern District Federal Court ordered Assembly Districts 8 and 9 (both in the City of Milwaukee) be changed to follow the court’s description. On September 19, 2012, LTSB divided the few remaining municipal wards that were split by a 2011 Wisconsin Act 43 or 44 district line.Election Data Overview: Election data that is included in this file was collected by LTSB from the Government Accountability Board (GAB)/Wisconsin Elections Commission (WEC) after each general election. A disaggregation process was performed on this election data based on the municipal ward layer that was available at the time of the election. The ward data that is collected after each decennial census is made up of collections of whole and split census blocks. (Note: Split census blocks occur during local redistricting when municipalities include recently annexed property in their ward submissions to the legislature).Disaggregation of Election Data: Election data is first disaggregated from reporting units to wards, and then to census blocks. Next, the election data is aggregated back up to wards, municipalities, and counties. The disaggregation of election data to census blocks is done based on total population. Detailed Methodology:Data is disaggregated first from reporting unit (i.e. multiple wards) to the ward level proportionate to the population of that ward.The data then is distributed down to the block level, again based on total population.When data is disaggregated to block or ward, we restrain vote totals not to exceed population 18 numbers, unless absolutely required.This methodology results in the following: Election data totals reported to the GAB/WEC at the state, county, municipal and reporting unit level should match the disaggregated election data total at the same levels. Election data totals reported to the GAB at ward level may not match the ward totals in the disaggregated election data file.Some wards may have more election data allocated than voter age population. This will occur if a change to the geography results in more voters than the 2010 historical population limits.Other things of note… We use a static, official ward layer (in this case created in 2011) to disaggregate election data to blocks. Using this ward layer creates some challenges. New wards are created every year due to annexations and incorporations. When these new wards are reported with election data, an issue arises wherein election data is being reported for wards that do not exist in our official ward layer. For example, if "Cityville" has four wards in the official ward layer, the election data may be reported for five wards, including a new ward from an annexation. There are two different scenarios and courses of action to these issues: When a single new ward is present in the election data but there is no ward geometry present in the official ward layer, the votes attributed to this new ward are distributed to all the other wards in the municipality based on population percentage. Distributing based on population percentage means that the proportion of the population of the municipality will receive that same proportion of votes from the new ward. In the example of Cityville explained above, the fifth ward may have five votes reported, but since there is no corresponding fifth ward in the official layer, these five votes will be assigned to each of the other wards in Cityville according the percentage of population.Another case is when a new ward is reported, but its votes are part of reporting unit. In this case, the votes for the new ward are assigned to the other wards in the reporting unit by population percentage; and not to wards in the municipality as a whole. For example, Cityville’s ward five was given as a reporting unit together with wards 1, 4, and 5. In this case, the votes in ward five are assigned to wards one and four according to population percentage. Outline Ward-by-Ward Election Results: The process of collecting election data and disaggregating to municipal wards occurs after a general election, so disaggregation has occurred with different ward layers and different population totals. We have outlined (to the best of our knowledge) what layer and population totals were used to produce these ward-by-ward election results.Election data disaggregates from GAB/WEC Reporting Unit -> Ward [Variant year outlined below]Elections 1990 – 2000: Wards 1991 (Census 1990 totals used for disaggregation)Elections 2002 – 2010: Wards 2001 (Census 2000 totals used for disaggregation)Elections 2012: Wards 2011 (Census 2010 totals used for disaggregation)Elections 2014 – 2016: Wards spring 2017 (Census 2010 totals used for disaggregation)Blocks 2011 -> Centroid geometry and spatially joined with Wards [All Versions]Each Block has an assignment to each of the ward versions outlined aboveIn the event that a ward exists now in which no block exists (Occurred with spring 2017) due to annexations, a block centroid was created with a population 0, and encoded with the proper Census IDs.Wards [All Versions] disaggregate -> Blocks 2011This yields a block centroid layer that contains all elections from 1990 to 2016Blocks 2011 [with all election data] -> Wards 2011 (then MCD 2011, and County 2011) All election data (including later elections such as 2016) is aggregated to the Wards 2011 assignment of the blocksNotes:Population of municipal wards 1991, 2001 and 2011 used for disaggregation were determined by their respective Census.Population and Election data will be contained within a county boundary. This means that even though municipal and ward boundaries vary greatly between versions of the wards, county boundaries have stayed the same. Therefore, data totals within a county should be the same between 2011 wards and 2018 wards.Election data may be different for the same legislative district, for the same election, due to changes in the wards from 2011 and 2018. This is due to (a) boundary corrections in the data from 2011 to 2018, and (b) annexations, where a block may have been reassigned.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Full Description This dataset contains aggregate data concerning the number of unique children placed in open DCF placements at a single point in time - July 1st of each State Fiscal Year. These figures are disaggregated by Region, Gender, whether placement is in or out-of-state, and by the Type of Placement in which the child is residing on the observation date. The 'Other' Region category includes all cases that are not being served by a Regional DCF Office. This includes cases being served as/by Aftercare, General Administration, Treatment Services, Special Investigations Unit, DCF Hotline, or cases that have not been assigned to a DCF Regional office as of the date of observation. Note: Not every combination of filters will have values. CTData also carries this data disaggregated by Racial and Ethnic Groups and by Age Group. For more information, click the link below to see the full metadata.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Major differences from previous work: For level 2 catch: Catches in tons, raised to match nominal values, now consider the geographic area of the nominal data for improved accuracy. Captures in "Number of fish" are converted to weight based on nominal data. The conversion factors used in the previous version are no longer used, as they did not adequately represent the diversity of captures. Number of fish without corresponding data in nominal are not removed as they were before, creating a huge difference for this measurement_unit between the two datasets. Nominal data from WCPFC includes fishing fleet information, and georeferenced data has been raised based on this instead of solely on the triplet year/gear/species, to avoid random reallocations. Strata for which catches in tons are raised to match nominal data have had their numbers removed. Raising only applies to complete years to avoid overrepresenting specific months, particularly in the early years of georeferenced reporting. Strata where georeferenced data exceed nominal data have not been adjusted downward, as it is unclear if these discrepancies arise from missing nominal data or different aggregation methods in both datasets. The data is not aggregated to 5-degree squares and thus remains unharmonized spatially. Aggregation can be performed using CWP codes for geographic identifiers. For example, an R function is available: source("https://raw.githubusercontent.com/firms-gta/geoflow-tunaatlas/master/sardara_functions/transform_cwp_code_from_1deg_to_5deg.R") Level 0 dataset has been modified creating differences in this new version notably : The species retained are different; only 32 major species are kept. Mappings have been somewhat modified based on new standards implemented by FIRMS. New rules have been applied for overlapping areas. Data is only displayed in 1 degrees square area and 5 degrees square areas. The data is enriched with "Species group", "Gear labels" using the fdiwg standards. These main differences are recapped in the Differences_v2018_v2024.zip Recommendations: To avoid converting data from number using nominal stratas, we recommend the use of conversion factors which could be provided by tRFMOs. In some strata, nominal data appears higher than georeferenced data, as observed during level 2 processing. These discrepancies may result from errors or differences in aggregation methods. Further analysis will examine these differences in detail to refine treatments accordingly. A summary of differences by tRFMOs, based on the number of strata, is included in the appendix. Some nominal data have no equivalent in georeferenced data and therefore cannot be disaggregated. What could be done is to check for each nominal data without equivalence if a georeferenced data exists in different buffers, and to average the distribution of this footprint. Then, disaggregate the nominal data based on the georeferenced data. This would lead to the creation of data (approximately 3%), and would necessitate reducing/removing all georeferenced data without a nominal equivalent or with a lesser equivalent. Tests are currently being conducted with and without this. It would help improve the biomass captured footprint but could lead to unexpected discrepancies with current datasets. For level 0 effort : In some datasets—namely those from ICCAT and the purse seine (PS) data from WCPFC— same effort data has been reported multiple times by using different units which have been kept as is, since no official mapping allows conversion between these units. As a result, users have be remind that some ICCAT and WCPFC effort data are deliberately duplicated : in the case of ICCAT data, lines with identical strata but different effort units are duplicates reporting the same fishing activity with different measurement units. It is indeed not possible to infer strict equivalence between units, as some contain information about others (e.g., Hours.FAD and Hours.FSC may inform Hours.STD). in the case of WCPFC data, effort records were also kept in all originally reported units. Here, duplicates do not necessarily share the same “fishing_mode”, as SETS for purse seiners are reported with an explicit association to fishing_mode, while DAYS are not. This distinction allows SETS records to be separated by fishing mode, whereas DAYS records remain aggregated. Some limited harmonization—particularly between units such as NET-days and Nets—has not been implemented in the current version of the dataset, but may be considered in future releases if a consistent relationship can be established.
Facebook
TwitterThis dataset is one of the outputs of the Global Spatially-Disaggregated Crop Production Statistics Data (MapSPAM) for 2010, which includes physical area, harvest area, production and yield, for 42 crops, disaggregated at the input-levels (e.g., irrigated/rainfed and high/low-input) on a 10 km grid globally. Crop production values in this dataset are given per ha for each technology aggregated by categories - crops/food/non-food - with no information on individual crops. Unit of measure: Production per ha for each technology: mt/ha This new version of MapSPAM, available to download from the Harvard Dataverse Website, marks the third generation of the SPAM data series, following those of 2000 and 2005. More information on the production systems and selected crops is available in the Global Spatially-Disaggregated Crop Production Statistics Data (MapSPAM) full metadata at https://data.apps.fao.org/map/catalog/srv/eng/catalog.search#/metadata/59f7a5ef-2be4-43ee-9600-a6a9e9ff562a
Facebook
TwitterElection Data Attribute Field Definitions | Wisconsin Cities, Towns, & Villages Data Attributes Ward Data Overview:July 2020 municipal wards were collected by LTSB through the WISE-Decade system. Current statutes require each county clerk, or board of election commissioners, no later than January 15 and July 15 of each year, to transmit to the LTSB, in an electronic format (approved by LTSB), a report confirming the boundaries of each municipality, ward and supervisory district within the county as of the preceding “snapshot” date of January 1 or July 1 respectively. Population totals for 2011 wards are carried over to the 2020 dataset for existing wards. New wards created since 2011 due to annexations, detachments, and incorporation are allocated population from Census 2010 collection blocks. LTSB has topologically integrated the data, but there may still be errors.Election Data Overview:The 2012-2020 Wisconsin election data that is included in this file was collected by LTSB from the *Wisconsin Elections Commission (WEC) after each general election. A disaggregation process was performed on this election data based on the municipal ward layer that was available at the time of the election. Disaggregation of Election Data:Election data is first disaggregated from reporting units to wards, and then to census blocks. Next, the election data is aggregated back up to wards, municipalities, and counties. The disaggregation of election data to census blocks is done based on total population. Detailed Methodology:Data is disaggregated first from reporting unit (i.e. multiple wards) to the ward level proportionate to the population of that ward. The data then is distributed down to the block level, again based on total population. When data is disaggregated to block or ward, we restrain vote totals not to exceed population 18 numbers, unless absolutely required.This methodology results in the following: Election data totals reported to the WEC at the state, county, municipal and reporting unit level should match the disaggregated election data total at the same levels. Election data totals reported to the WEC at ward level may not match the ward totals in the disaggregated election data file. Some wards may have more election data allocated than voter age population. This will occur if a change to the geography results in more voters than the 2010 historical population limits.Other things of note…We use a static, official ward layer (in this case created in 2020) to disaggregate election data to blocks. Using this ward layer creates some challenges. New wards are created every year due to annexations and incorporations. When these new wards are reported with election data, an issue arises wherein election data is being reported for wards that do not exist in our official ward layer. For example, if Cityville has four wards in the official ward layer, the election data may be reported for five wards, including a new ward from an annexation. There are two different scenarios and courses of action to these issues: When a single new ward is present in the election data but there is no ward geometry present in the official ward layer, the votes attributed to this new ward are distributed to all the other wards in the municipality based on population percentage. Distributing based on population percentage means that the proportion of the population of the municipality will receive that same proportion of votes from the new ward. In the example of Cityville explained above, the fifth ward may have five votes reported, but since there is no corresponding fifth ward in the official layer, these five votes will be assigned to each of the other wards in Cityville according the percentage of population.Another case is when a new ward is reported, but its votes are part of reporting unit. In this case, the votes for the new ward are assigned to the other wards in the reporting unit by population percentage; and not to wards in the municipality as a whole. For example, Cityville’s ward 5 was given as a reporting unit together with wards 1, 4, and 5. In this case, the votes in ward five are assigned to wards 1 and 4 according to population percentage. Outline Ward-by-Ward Election ResultsThe process of collecting election data and disaggregating to municipal wards occurs after a general election, so disaggregation has occurred with different ward layers and different population totals. We have outlined (to the best of our knowledge) what layer and population totals were used to produce these ward-by-ward election results.Election data disaggregates from WEC Reporting Unit -> Ward [Variant year outlined below]Elections 1990 – 2000: Wards 1991 (Census 1990 totals used for disaggregation)Elections 2002 – 2010: Wards 2001 (Census 2000 totals used for disaggregation)Elections 2012: Wards 2011 (Census 2010 totals used for disaggregation)Elections 2014 – 2016: Wards 2018 (Census 2010 totals used for disaggregation)Elections 2018: Wards 2018Elections 2020: Wards 2020Blocks 2011 -> Centroid geometry and spatially joined with Wards [All Versions]Each Block has an assignment to each of the ward versions outlined aboveIn the event that a ward exists now in which no block exists (occurred with spring 2020) due to annexations, a block centroid was created with a population 0, and encoded with the proper Census IDs.Wards [All Versions] disaggregate -> Blocks 2011This yields a block centroid layer that contains all elections from 1990 to 2018Blocks 2011 [with all election data] -> Wards 2020 (then MCD 2020, and County 2020) All election data (including later elections) is aggregated to the Wards 2020 assignment of the blocksNotes:Population of municipal wards 1991, 2001 and 2011 used for disaggregation were determined by their respective Census.Population and Election data will be contained within a county boundary. This means that even though MCD and ward boundaries vary greatly between versions of the wards, county boundaries have stayed the same, so data should total within a county the same between wards 2011 and wards 2020.Election data may be different for the same legislative district, for the same election, due to changes in the wards from 2011 and 2020. This is due to boundary corrections in the data from 2011 to 2020, and annexations, where a block may have been reassigned.*WEC replaced the previous Government Accountability Board (GAB) in 2016, which replaced the previous State Elections Board in 2008.
Facebook
TwitterAttribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The following gender disaggregated training data is organized annually with period from 1 July to 30 June. The data represents military, police and civilian training.
Member States are responsible for delivering the pre-deployment training (PDT) to all units and personnel provided to UN peacekeeping operations. ITS delivers training of trainer’s courses for Member State trainers to build national capacity to deliver training to UN standards. Civilian Pre-Deployment Training (CPT) improves preparedness and effectiveness of civilian peacekeepers. ITS has a dedicated team that delivers CPT at the UN Regional Service Centre in Entebbe, Uganda. Senior Leadership Training targets the highest levels (SRSG, DSRSG, Force Commander or Head of Military Component, Police Commissioner and Director of Mission Support) of field mission leadership to provide them with the knowledge needed to lead and manage field missions.
This dataset is managed by the Integrated Training Service of the UN Department of Peace Operations.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information. If you have questions about the underlying data stored here, please contact the Environmental Protection Agency using their RSEI Contact Form https://www.epa.gov/rsei/forms/contact-us-about-rsei-model or email Mitchell Sumner (sumner.mitchell@epa.gov). If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu The Geographic Microdata from EPA's Risk-Screening Environmental Indicators (RSEI) model are unique datasets that provide detailed air and water modeling results at various levels of aggregation, spatial geographies, and time periods for data user needs. RSEI Geographic Microdata allow for a flexible ability to compare and analyze RSEI model outputs and results from a receptor-based perspective of potentially impacted geographic areas. These data include values related to each modeled chemical release to air and water and each potentially impacted geographic area located around the facilities that ultimately release the chemical into the environment. Users can examine the potential impacts that environmental releases of toxic chemicals from multiple facilities may have on a particular area, regardless of where the releases originate, and get a more realistic picture of the degree to which the area is potentially affected by TRI chemical releases. Underlying these results is the ability to locate facilities, environmental releases, and people geographically and attribute characteristics of the physical environment such as meteorology, hydrography, and topography on surrounding areas once they are located to estimate potential exposure and relative health impacts. The RSEI model describes the U.S. and its territories using a grid-based system and a surface water network. Facility- and chemical-specific data retrieved from Agency-reported informational data sources (such as site addresses and lat/long coordinates) are then geographically indexed to their corresponding grid cell in the grid system or stream segment (flowline) in the surface water network for modeling purposes. The RSEI air modeling and RSEI water modeling pages contain more information on how RSEI models these types of releases. [Quote from: https://www.epa.gov/rsei/rsei-geographic-microdata-rsei-gm] Data for United States Environmental Protection Agency (EPA) Risk Screening Environmental Indicators (RSEI) model Disaggregated Microdata, 2017, 1988-2017 data. Original data were downloaded 2 August 2025 from http://abt-rsei.s3-website-us-east-1.amazonaws.com/?prefix=microdata2017/. Documentation pdf was downloaded from here: https://www.epa.gov/sites/default/files/2017-01/documents/rsei-documentation-geographic-microdata-v235.pdf . RSEI data are distributed in aggregated and disaggregated forms. Disaggregated data has separate results, concentrations, and toxicity-weighted concentrations for each modeled chemical release for each unit of analysis (810m grid cell, block group census tract, etc.). Aggregated data includes scores summed for all chemical releases. The data in this deposit is disaggregated and distributed at the 810m grid cell level. For RSEI Aggregated Microdata 2017, see: https://sciop.net/uploads/bf258852c37fa1cae8512c49cda65b2a83403c9f, and for other components of the Disaggregated 2017 Microdata, see Zenodo repositories: https://zenodo.org/records/17065165 (Census Agg); https://doi.org/10.5281/zenodo.17065220 (Census Full pt 1); https://zenodo.org/records/17088034 (Census Full pt 2); https://zenodo.org/records/17109745 (Census Full pt 3); https://zenodo.org/records/17102039 (Census Full pt 4); https://zenodo.org/records/17109396 (Census Full pt 5); https://zenodo.org/records/17109593 (Shapefiles pt 1); https://zenodo.org/records/17127799 (Shapefiles pt 2) RSEI Disagg. Microdata 2017 (9) 1988-2017 data
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Heat causes protein misfolding and aggregation and in eukaryotic cells triggers aggregation of proteins and RNA into stress granules. We have carried out extensive proteomic studies to quantify heat-triggered aggregation and subsequent disaggregation in budding yeast, identifying >170 endogenous proteins aggregating within minutes of heat shock in multiple subcellular compartments. We demonstrate that these aggregated proteins are not misfolded and destined for degradation. Stable-isotope labeling reveals that even severely aggregated endogenous proteins are disaggregated without degradation during recovery from shock, contrasting with the rapid degradation observed for exogenous thermolabile proteins. Although aggregation likely inactivates many cellular proteins, in the case of a heterotrimeric aminoacyl-tRNA synthetase complex, the aggregated proteins remain active with unaltered fidelity. We propose that most heat-induced aggregation of mature proteins reflects the operation of an adaptive, autoregulatory process of functionally significant aggregate assembly and disassembly that aids cellular adaptation to thermal stress.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The new scorecard tracks progress toward the World Bank Group's vision to create a world free of poverty on a livable planet. The Scorecard includes three types of indicators: - Vision indicators - reflect the new vision for the WBG, showing the WBG’s ambition and providing high-level measures to gauge the direction and pace of progress in tackling global challenges. Vision indicators contain aggregated and disaggregated development context data for all countries in the world, where data is available. The Scorecard reports the latest available global updates for each of these indicators. - Client context indicators - reflect the circumstances in client countries, including multidimensional aspects of poverty, and are aligned with the Sustainable Development Goals (SDGs). They serve to frame the challenges clients face, and the context in which the WBG operates. Client Context indicators contain aggregated and disaggregated development context data for World Bank client countries, based on country eligibility for financing and where data is available. The Scorecard also reports the latest available update for each of these indicators. - WBG Results indicators monitor WBG progress on some of the most critical global challenges. Results data include: - Active Portfolio Results: Contain achieved and expected results of WBG operations based on its active portfolio as of end of June 2024. Includes aggregated and disaggregated data. - Results achieved since July 1st, 2023: Contain cumulative results achieved between July 1st, 2023 - June 30, 2024 from active and closed projects. Results achieved before July 1st, 2023 are excluded from this calculation. Includes aggregated data for World Bank, IBRD and IDA only. IFC and MIGA do not currently report this data. - Operations Details: Operation-level detail is provided for World Bank projects. However, in alignment with IFC and MIGA Access to Information Policies, project-level data is available in an aggregated format on the WBG Scorecard, provided the minimum threshold to secure individual clients' data is satisfied. This collection includes only a subset of indicators from the source dataset.
Facebook
Twitterhttps://creativecommons.org/share-your-work/public-domain/pdmhttps://creativecommons.org/share-your-work/public-domain/pdm
Since 1968, the Civil Rights Data Collection (CRDC) has collected data on key education and civil rights issues in our nation's public schools for use by the Department of Education’s Office for Civil Rights (OCR), other Department offices, other federal agencies, and by policymakers and researchers outside of the Department. The CRDC has generally been collected biennially from school districts in each of the 50 states, and the District of Columbia. The CRDC collects information about school characteristics and about programs, services, and outcomes for students. Most student data are disaggregated by race/ethnicity, gender, limited English proficiency, and disability. The 2011-12 CRDC included all public schools and public school districts in the nation that serve students for at least 50% of the school day. The CRDC also includes long-term secure juvenile justice agencies, schools for the blind and deaf, and alternative schools. ***Microdata: Yes Level of Analysis: Local - school district/schools Variables Present: Just variable names and reserved code File Layout: .xslx Codebook: No (variable definitions are present) Methods: Yes Weights (with appropriate documentation): Yes Publications: No Aggregate Data: No
Facebook
TwitterThis dataset was created to pilot techniques for creating synthetic data from datasets containing sensitive and protected information in the local government context. Synthetic data generation replaces actual data with representative data generated from statistical models; this preserves the key data properties that allow insights to be drawn from the data while protecting the privacy of the people included in the data. We invite you to read the Understanding Synthetic Data white paper for a concise introduction to synthetic data.
This effort was a collaboration of the Urban Institute, Allegheny County’s Department of Human Services (DHS) and CountyStat, and the University of Pittsburgh’s Western Pennsylvania Regional Data Center.
The source data for this project consisted of 1) month-by-month records of services included in Allegheny County's data warehouse and 2) demographic data about the individuals who received the services. As the County’s data warehouse combines this service and client data, this data is referred to as “Integrated Services data”. Read more about the data warehouse and the kinds of services it includes here.
Synthetic data are typically generated from probability distributions or models identified as being representative of the confidential data. For this dataset, a model of the Integrated Services data was used to generate multiple versions of the synthetic dataset. These different candidate datasets were evaluated to select for publication the dataset version that best balances utility and privacy. For high-level information about this evaluation, see the Synthetic Data User Guide.
For more information about the creation of the synthetic version of this data, see the technical brief for this project, which discusses the technical decision making and modeling process in more detail.
This disaggregated synthetic data allows for many analyses that are not possible with aggregate data (summary statistics). Broadly, this synthetic version of this data could be analyzed to better understand the usage of human services by people in Allegheny County, including the interplay in the usage of multiple services and demographic information about clients.
Some amount of deviation from the original data is inherent to the synthetic data generation process. Specific examples of limitations (including undercounts and overcounts for the usage of different services) are given in the Synthetic Data User Guide and the technical report describing this dataset's creation.
Please reach out to this dataset's data steward (listed below) to let us know how you are using this data and if you found it to be helpful. Please also provide any feedback on how to make this dataset more applicable to your work, any suggestions of future synthetic datasets, or any additional information that would make this more useful. Also, please copy wprdc@pitt.edu on any such feedback (as the WPRDC always loves to hear about how people use the data that they publish and how the data could be improved).
1) A high-level overview of synthetic data generation as a method for protecting privacy can be found in the Understanding Synthetic Data white paper.
2) The Synthetic Data User Guide provides high-level information to help users understand the motivation, evaluation process, and limitations of the synthetic version of Allegheny County DHS's Human Services data published here.
3) Generating a Fully Synthetic Human Services Dataset: A Technical Report on Synthesis and Evaluation Methodologies describes the full technical methodology used for generating the synthetic data, evaluating the various options, and selecting the final candidate for publication.
4) The WPRDC also hosts the Allegheny County Human Services Community Profiles dataset, which provides annual updates on human-services usage, aggregated by neighborhood/municipality. That data can be explored using the County's Human Services Community Profile web site.