100+ datasets found
  1. Data from: Graph-based deep learning models for thermodynamic property...

    • figshare.com
    csv
    Updated Oct 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bowen Deng; Thijs Stuyver (2024). Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture [Dataset]. http://doi.org/10.6084/m9.figshare.27262947.v3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 30, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Bowen Deng; Thijs Stuyver
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains the formation energy of BDE-db, QM9, PC9, QMugs, and QMugs1.1 datasets by filtering (The training, test, and validation sets were randomly split in a ratio of 0.8, 0.1, and 0.1, respectively). The filtered process is described in the article "Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture" and the code can be found at https://github.com/chimie-paristech-CTM/thermo_GNN.After application of the filter procedure described in the article, final versions of the QM9 (127,007 data points), BDE-db (289,639 data points), PC9 (96,634 data points), QMugs (636,821 data points) and QMugs1.1 (70,546 data points) were obtained and used throughout this study.

  2. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Updated May 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau (2001). undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSSE2017.K200104
    Explore at:
    Dataset updated
    May 1, 2001
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2017 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2017 American Community Survey 1-Year Estimates

  3. Z

    Data and code associated with "Evaluating the definition and distribution of...

    • data.niaid.nih.gov
    Updated Feb 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lee, Benjamin (2024). Data and code associated with "Evaluating the definition and distribution of spring ephemeral wildflowers in eastern North America" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10015914
    Explore at:
    Dataset updated
    Feb 7, 2024
    Dataset provided by
    University of Michigan–Ann Arbor
    Authors
    Lee, Benjamin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North America
    Description

    Data and code associated with a paper by Yancy et al titled "Evaluating the definition and distribution of spring ephemeral wildflowers in eastern North America". Metadata is included in files when possible.

  4. Definition of terms presented in calculating the index of distributional...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joel P. Heath; William A. Montevecchi; Daniel Esler (2023). Definition of terms presented in calculating the index of distributional consistency. [Dataset]. http://doi.org/10.1371/journal.pone.0044353.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joel P. Heath; William A. Montevecchi; Daniel Esler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definition of terms presented in calculating the index of distributional consistency.

  5. n

    Data from: Climate-limited vegetation change in the conterminous United...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Mar 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adriana Parra; Jonathan Greenberg (2024). Climate-limited vegetation change in the conterminous United States of America [Dataset]. http://doi.org/10.5061/dryad.j0zpc86nm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 5, 2024
    Dataset provided by
    University of Nevada, Reno
    Authors
    Adriana Parra; Jonathan Greenberg
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Contiguous United States, United States
    Description

    In the study “CLIMATE-LIMITED VEGETATION CHANGE IN THE CONTERMINOUS UNITED STATES OF AMERICA”, published in the Global Change Biology journal, we evaluated the effects of climate conditions on vegetation composition and distribution in the conterminous United States (CONUS). To disentangle the direct effects of climate change from different non-climate factors, we applied "Liebig's law of the minimum" in a geospatial context, and determined the climate-limited potential for tree, shrub, herbaceous, and non-vegetation fractional cover change. We then compared these potential rates against observed change rates for the period 1986 to 2018 to identify areas of the CONUS where vegetation change is likely being limited by climatic conditions. This dataset contains the input and the resulting rasters for the study which include a) the observed rates of vegetation change, b) the climate derived potential vegetation rates of change, c) the difference between potential and observed values and d) the identified climatic limiting factor. Methods Input data

    We use the available data from the “Vegetative Lifeform Cover from Landsat SR for CONUS” product (https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1809) to evaluate the changes in vegetation fractional cover.

    The information for the climate factors was derived from the TerraClimate data catalog (https://www.climatologylab.org/terraclimate.html). We downloaded data from this catalog for the period 1971 to 2018 for the following variables: minimum temperature (TMIN), precipitation (PPT), actual evapotranspiration (AET), potential evapotranspiration (PET), and climatic water deficit (DEF).

    Preprocessing of vegetation fractional cover data

    We resampled and aligned the maps of fractional cover using pixel averaging to the extent and resolution of the TerraClimate dataset (~ 4 km). Then, we calculated rates of lifeform cover change per pixel using the Theil-Sen slope analysis (Sen, 1968; Theil, 1992).

    Preprocessing of climate variables data

    To process the climate data, we defined a year time step as the months from July of one year to July of the next. Following this definition, we constructed annual maps of each climate variable for the years 1971 to 2018.

    The annual maps of each climate variable were further summarized per pixel, into mean and slope (calculated as the Theil-Sen slope) across one, two, three, four, five, ten-, and 15-year lags.

    Estimation of climate potential

    We constructed a final multilayer dataset of response and predictor variables for the CONUS including the resulting maps of fractional cover rate of change (four response variables), the mean and slope maps for the climate variables for all the time-lags (70 predictor variables), and the initial percent cover for each lifeform in the year 1986 (four predictor variables).

    We evaluated for each pixel in the CONUS which of the predictor variables produced the minimum potential rate of change in fractional cover for each lifeform class. To do that, we first calculated the 100% quantile hull of the distribution of each predictor variable against each response variable.

    To calculate the 100% quantile of the predictor variables’ distribution we divided the total range of each predictor variable into equal-sized bins. The size and number of bins were set specifically per variable due to differences in their data distribution. For each of the bins, we calculated the maximum value of the vegetation rate of change, which resulted in a lookup table with the lower and upper boundaries of each bin, and the associated maximum rate of change. We constructed a total of 296 lookup tables, one per lifeform class and predictor variable combination. The resulting lookup tables were used to construct spatially explicit maps of maximum vegetation rate of change from each of the predictor variable input rasters, and the final climate potential maps were constructed by stacking all the resulting maps per lifeform class and selecting for each pixel the minimum predicted rate of change and the predictor variable that produced that rate.

    Identifying climate-limited areas

    We defined climate-limited areas as the parts of the CONUS with little or no differences between the estimated climate potential and the observed rates of change in fractional cover. To identify these areas, we subtracted the raster of observed rates of change from the raster of climate potential for each lifeform class.

  6. m

    VERSION SUPERSEDED - Nephrops Underwater TV Survey FU22 The "Smalls"

    • data.marine.ie
    • data.europa.eu
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marine Institute (2023). VERSION SUPERSEDED - Nephrops Underwater TV Survey FU22 The "Smalls" [Dataset]. https://data.marine.ie/geonetwork/srv/api/records/ie.marine.data:dataset.4012
    Explore at:
    www:link-1.0-http--link, www:download-1.0-http--downloadAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset authored and provided by
    Marine Institute
    Time period covered
    Jun 27, 2006 - Present
    Description

    SUPERSEDED - The dataset was originally published with a doi in February 2020 but has been superseded by an updated version. This DOI has been superseded in May 2021 by https://doi.org/ 10/gc9t due to corrections applied to the dataset. Nephrops norvegicus are common around the Irish coast, occurring in geographically distinct sandy or muddy areas where the sediment is suitable for construction of their burrow dwellings. The Marine Institute carries out Underwater TV surveys annually of commercially important Nephrops stocks. This dataset provides quality assured estimates of Nephrops burrow densities over the known spatial and bathymetric distribution of the FU (functional unit) 22: the “Smalls” Nephrops ground. From 2006 to 2011 (UWTV) stations set at 3.0 nautical mile spacing over the known distribution. From 2012 onwards a randomised isometric grid of underwater television (UWTV) stations set at 4.5 nautical mile spacing. Underwater TV Survey reports for this Nephrops stock are available at: http://hdl.handle.net/10793/1428 Also available is the ICES Cooperative Research Reports which details use of UWTV surveys to assess Nephrops stock:https://tinyurl.com/ices-nephrops GIS shapefiles of FU22 and the “Smalls” Nephrops grounds are provided. This dataset covers the period of 2006 and is ongoing. One hundred percent of the survey grid was covered in all years except in 2015, where 83 percent of the grid was covered. These 7 stations in 2015 could not be completed due to very poor or nil visibility conditions encountered at seabed. For these stations density estimates were filled-in using and average of historic values within 2nmi (buffer2estimated). Dataset fields are Nephrops Functional Unit Number; Survey Code; Year; UWTV station number; Date-Start of UWTV track; Time_Start of UWTV track; Date-End of UWTV track; Time_End of UWTV track; Decimalised longitude and latitude midpoint of the UWTV station track; Adjusted density (Nephrops burrows/m²) ;Length in metres of the UWTV station track; Field of View of camera system in metres; Total Nephrops burrow count; Nephrops Fishing Ground Name; Source of positional data to calculate UWTV station track (USBL sled GPS, SHIP GPS, Layback, estimated GPS, buffer2estimated); Camera system used (SD = standard analogue system, HD = high definition system); Data Extraction method (SQL, MSAccess); Data Status (Final for analysis); Research Vessel Name; Correction Factor (Density / Correction Factor = Adjusted Density) and Depth (metres). Entries with NA means data is not available.

    Suggested Citation: Doyle, Jennifer. (2020) VERSION SUPERSEDED - Nephrops Underwater TV Survey FU22 The "Smalls". Marine Institute, Ireland. doi:10/dk22.

  7. Distribution of households in the U.S. 1970-2024, by household size

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Distribution of households in the U.S. 1970-2024, by household size [Dataset]. https://www.statista.com/statistics/242189/disitribution-of-households-in-the-us-by-household-size/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In 2024, 34.59 percent of all households in the United States were two person households. In 1970, this figure was at 28.92 percent. Single households Single mother households are usually the most common households with children under 18 years old found in the United States. As of 2021, the District of Columbia and North Dakota had the highest share of single-person households in the United States. Household size in the United States has decreased over the past century, due to customs and traditions changing. Families are typically more nuclear, whereas in the past, multigenerational households were more common. Furthermore, fertility rates have also decreased, meaning that women do not have as many children as they used to. Average households in Utah Out of all states in the U.S., Utah was reported to have the largest average household size. This predominately Mormon state has about three million inhabitants. The Church of the Latter-Day Saints, or Mormonism, plays a large role in Utah, and can contribute to the high birth rate and household size in Utah. The Church of Latter-Day Saints promotes having many children and tight-knit families. Furthermore, Utah has a relatively young population, due to Mormons typically marrying and starting large families younger than those in other states.

  8. a

    Terrestrial Ecosystem Information (TEI) Data Distribution Packages

    • catalogue.arctic-sdi.org
    • open.canada.ca
    Updated Jan 22, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). Terrestrial Ecosystem Information (TEI) Data Distribution Packages [Dataset]. https://catalogue.arctic-sdi.org/geonetwork/srv/resources/datasets/8fd15e4e-e7b1-4566-81d1-1ff8947bfd46
    Explore at:
    Dataset updated
    Jan 22, 2016
    Description

    The TEI Data Distribution packages in this folder contain of the full Terrestrial Ecosystem Information (TEI) dataset split into Predictive Ecosystem Mapping (PEM) data and non-PEM data which includes Terrestrial Ecosystem Mapping (TEM), Terrain Mapping (TER), Bioterrain Mapping (TBT) Terrain Stability Mapping (TSM), Sensitive Ecosystems Inventory (SEI), Soil Mapping (SOIL project boundaries only), and Wildlife Habitat Ratings (WHR project boundaries only) by Natural Resource Sector Region (see Index map .pdf). Data includes the Project Boundaries (with project metadata and links to related data such as reports), Long Table (detailed mapping polygons with the full RISC standard attribute table), Short Table (detailed mapping polygons with key and amalgamated (concatenated) attributes derived from Long Table), On-site Symbol features (point, line or polygon terrain features such as landslide tracks, scarps), Sample Sites (field sampling locations), and any user-defined tables. The data dictionary is also available. This data is in file geodatabase format. Current version: v11 (published on 2024-10-03) Previous versions: v10 (published on 2023-11-14), v9 (published on 2023-03-01), v8 (published on 2016-09-01) Note that the Soil Mapping dataset is available from: http://www.env.gov.bc.ca/esd/distdata/ecosystems/Soil_Data/SOIL_DATA_FGDB/

  9. 🩺📊 Cancer Prediction Dataset 🌟🔬

    • kaggle.com
    Updated Jun 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rabie El Kharoua (2024). 🩺📊 Cancer Prediction Dataset 🌟🔬 [Dataset]. http://doi.org/10.34740/kaggle/dsv/8651738
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 10, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rabie El Kharoua
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This dataset contains medical and lifestyle information for 1500 patients, designed to predict the presence of cancer based on various features. The dataset is structured to provide a realistic challenge for predictive modeling in the medical domain.

    Dataset Structure

    Features

    1. Age: Integer values representing the patient's age, ranging from 20 to 80.

    2. Gender: Binary values representing gender, where 0 indicates Male and 1 indicates Female.

    3. BMI: Continuous values representing Body Mass Index, ranging from 15 to 40.

    4. Smoking: Binary values indicating smoking status, where 0 means No and 1 means Yes.

    5. GeneticRisk: Categorical values representing genetic risk levels for cancer, with 0 indicating Low, 1 indicating Medium, and 2 indicating High.

    6. PhysicalActivity: Continuous values representing the number of hours per week spent on physical activities, ranging from 0 to 10.

    7. AlcoholIntake: Continuous values representing the number of alcohol units consumed per week, ranging from 0 to 5.

    8. CancerHistory: Binary values indicating whether the patient has a personal history of cancer, where 0 means No and 1 means Yes.

    9. Diagnosis: Binary values indicating the cancer diagnosis status, where 0 indicates No Cancer and 1 indicates Cancer.

    Target Variable

    • Diagnosis: The main variable to predict, indicating if a patient has cancer.

    Data Distribution

    • The dataset is balanced with respect to feature distributions and includes realistic variability in patient data.

    Usage

    Intended Use

    This dataset is intended for training and testing machine learning models for cancer prediction. It can be used for:

    • Model training and evaluation.
    • Feature importance analysis.
    • Algorithm benchmarking.

    Disclaimer

    This dataset has been preprocessed and cleaned to ensure that users can focus on the most critical aspects of their analysis. The preprocessing steps were designed to eliminate noise and irrelevant information, allowing you to concentrate on developing and fine-tuning your predictive models.

    Considerations

    • The dataset includes a variety of features known to be associated with cancer risk, making it suitable for exploring different modeling approaches and feature engineering techniques.

    Dataset Usage and Attribution Notice

    This dataset, shared by Rabie El Kharoua, is original and has never been shared before. It is made available under the CC BY 4.0 license, allowing anyone to use the dataset in any form as long as proper citation is given to the author. A DOI is provided for proper referencing. Please note that duplication of this work within Kaggle is not permitted.

    Exclusive Synthetic Dataset

    This dataset is synthetic and was generated for educational purposes, making it ideal for data science and machine learning projects. It is an original dataset, owned by Mr. Rabie El Kharoua, and has not been previously shared. You are free to use it under the license outlined on the data card. The dataset is offered without any guarantees. Details about the data provider will be shared soon.

  10. d

    A graduated nativeness definition applied to the vascular flora of Denmark

    • datadryad.org
    • search.dataone.org
    zip
    Updated Nov 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Camilla Tvede Colding-Jørgensen; Rasmus Ejrnæs; Jens-Christian Svenning; Hans Henrik Kehlet Bruun (2025). A graduated nativeness definition applied to the vascular flora of Denmark [Dataset]. http://doi.org/10.5061/dryad.n02v6wx9q
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 3, 2025
    Dataset provided by
    Dryad
    Authors
    Camilla Tvede Colding-Jørgensen; Rasmus Ejrnæs; Jens-Christian Svenning; Hans Henrik Kehlet Bruun
    Time period covered
    Sep 15, 2025
    Area covered
    Denmark
    Description

    Data from: A graduated nativeness definition applied to the vascular flora of Denmark

    Dataset DOI: 10.5061/dryad.n02v6wx9q

    Description of the data and file structure

    The dataset contains a new graduated nativeness status for the Danish vascular flora. In addition, we list our version of species’ status from a binary definition of nativeness from the three sources: the Euro+Med Plantbase (Euro+Med 2006), the Danish Redlist (Moeslund 2023) and Atlas Flora Danica (Hartvig and Vestergaard 2015).

    Data from Euro+Med (2006) were used to create the graduated definition of nativeness, with Denmark as the focal territory. Species given in that source as non-native in Denmark, but strictly native to one or more of the following neighbouring countries (or Euro+Med territories) were re-classified as ‘near-native’ : Sweden, Norway, Germany, the Netherlands, Poland, Latvia, Lithuania, Belgium with Luxembourg, the Czech Republic, Estonia, “Baltic states with Kalini...

  11. Training algorithm flow.

    • plos.figshare.com
    xls
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xing Chen; Na Zhang; Xiaohui Yang; Chunyan Wang; Qi Na; Tianyun Luan; Wendi Zhu; Chenjie Zhang; Chao Yang (2024). Training algorithm flow. [Dataset]. http://doi.org/10.1371/journal.pone.0292480.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xing Chen; Na Zhang; Xiaohui Yang; Chunyan Wang; Qi Na; Tianyun Luan; Wendi Zhu; Chenjie Zhang; Chao Yang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In daily life, two common algorithms are used for collecting medical disease data: data integration of medical institutions and questionnaires. However, these statistical methods require collecting data from the entire research area, which consumes a significant amount of manpower and material resources. Additionally, data integration is difficult and poses privacy protection challenges, resulting in a large number of missing data in the dataset. The presence of incomplete data significantly reduces the quality of the published data, hindering the timely analysis of data and the generation of reliable knowledge by epidemiologists, public health authorities, and researchers. Consequently, this affects the downstream tasks that rely on this data. To address the issue of discrete missing data in cardiac disease, this paper proposes the AGAN (Attribute Generative Adversarial Nets) architecture for missing data filling, based on generative adversarial networks. This algorithm takes advantage of the strong learning ability of generative adversarial networks. Given the ambiguous meaning of filling data in other network structures, the attribute matrix is designed to directly convert it into the corresponding data type, making the actual meaning of the filling data more evident. Furthermore, the distribution deviation between the generated data and the real data is integrated into the loss function of the generative adversarial networks, improving their training stability and ensuring consistency between the generated data and the real data distribution. This approach establishes the missing data filling mechanism based on the generative adversarial networks, which ensures the rationality of the data distribution while filling the missing data samples. The experimental results demonstrate that compared to other filling algorithms, the data matrix filled by the proposed algorithm in this paper has more evident practical significance, fewer errors, and higher accuracy in downstream classification prediction.

  12. f

    Data from: Set definition.

    • plos.figshare.com
    xls
    Updated Jun 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Changxi Ma; Wei Hao; Fuquan Pan; Wang Xiang (2023). Set definition. [Dataset]. http://doi.org/10.1371/journal.pone.0198931.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 18, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Changxi Ma; Wei Hao; Fuquan Pan; Wang Xiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Set definition.

  13. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSST1Y2016.S0801
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Tell us what you think. Provide feedback to help make American Community Survey data more useful for you..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2016 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..Workers include members of the Armed Forces and civilians who were at work last week..The 12 selected states are Connecticut, Maine, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Vermont, and Wisconsin..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2016 American Community Survey 1-Year Estimates

  14. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSST5Y2014.S2401
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2010-2014 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Occupation codes are 4-digit codes and are based on Standard Occupational Classification 2010..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2010-2014 American Community Survey 5-Year Estimates

  15. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSDT1Y2012.C02015
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2012 American Community Survey (ACS) data generally reflect the December 2009 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..This table has been updated to include additional categories for detailed Asian groups. Multi-year estimates for these additional detailed groups will be produced after three single years of data is tabulated (beginning with the first 1-year release in 2011)...Total includes people who reported Asian only, regardless of whether they reported one or more detailed Asian groups...Other Asian, specified. Includes respondents who provide a response of another Asian group not shown separately, such as Iwo Jiman, Maldivian, or Singaporean...Other Asian, not specified. Includes respondents who checked the "Other Asian" response category on the ACS questionnaire and did not write in a specific group or wrote in a generic term such as "Asian," or "Asiatic." ..Two or more Asian. Includes respondents who provided multiple Asian responses such as Asian Indian and Japanese; or Vietnamese, Chinese and Hmong...Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2012 American Community Survey

  16. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/cedsci/table?text=S1501&g=0100000US_0400000US55_0500000US55025&tid=ACSST5Y2015.S1501
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Tell us what you think. Provide feedback to help make American Community Survey data more useful for you..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2011-2015 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Questions for "wage and salary" and "tips, bonuses and commissions" were asked separately for the first time during non-response follow-up via Computer Assisted Telephone Interview (CATI) and Computer Assisted Personal Interview (CAPI). Prior to 2013 these questions were asked in combination, "wages, salary, tips, bonuses and commissions."..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2011-2015 American Community Survey 5-Year Estimates

  17. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSDT1Y2012.C25050?q=population+per+state
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2000 data. Boundaries for urban areas have not been updated since Census 2000. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2012 American Community Survey (ACS) data generally reflect the December 2009 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..The 2009, 2010, 2011, and 2012 plumbing data for Puerto Rico will not be shown. Research indicates that the questions on plumbing facilities that were introduced in 2008 in the stateside American Community Survey and the 2008 Puerto Rico Community Survey may not have been appropriate for Puerto Rico..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2012 American Community Survey

  18. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSDT5Y2013.B99132
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Data and Documentation section...Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau''s Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities and towns and estimates of housing units for states and counties..Explanation of Symbols:An ''**'' entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate..An ''-'' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution..An ''-'' following a median estimate means the median falls in the lowest interval of an open-ended distribution..An ''+'' following a median estimate means the median falls in the upper interval of an open-ended distribution..An ''***'' entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate..An ''*****'' entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate. .An ''N'' entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small..An ''(X)'' means that the estimate is not applicable or not available..Estimates of urban and rural population, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..While the 2009-2013 American Community Survey (ACS) data generally reflect the February 2013 Office of Management and Budget (OMB) definitions of metropolitan and micropolitan statistical areas; in certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB definitions due to differences in the effective dates of the geographic entities..Fertility data are not available for certain geographic areas due to problems with data collection. See Errata Note #92 for details. ..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data). The effect of nonsampling error is not represented in these tables..Source: U.S. Census Bureau, 2009-2013 5-Year American Community Survey

  19. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/DECENNIALDHCMP2020.H10?g=160XX00US6900300
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Note: For information on data collection, confidentiality protection, nonsampling error, and definitions, see the 2020 Island Areas Censuses Technical Documentation..Note: For information on the codes used when processing the data in this table, see the 2020 Island Areas Censuses Technical Documentation..Explanation of Symbols: 1.An "-" means the statistic could not be computed because there were an insufficient number of observations. 2. An "-" following a median estimate means the median falls in the lowest interval of an open-ended distribution.3. An "+" following a median estimate means the median falls in the upper interval of an open-ended distribution.4. An "N" means data are not displayed for the selected geographic area due to concerns with statistical reliability or an insufficient number of cases.5. An "(X)" means not applicable..Source: U.S. Census Bureau, 2020 Census, Commonwealth of the Northern Mariana Islands.

  20. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/DECENNIALCROSSTABAS2020.CT92?q=MEDIAN%20EARNINGS&g=010XX00US$0400000,
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Note: For information on data collection, confidentiality protection, nonsampling error, and definitions, see the 2020 Island Areas Censuses Technical Documentation..Due to COVID-19 restrictions impacting data collection for the 2020 Census of American Samoa, data tables reporting social and economic characteristics do not include the group quarters population in the table universe. As a result, impacted 2020 data tables should not be compared to 2010 and other past census data tables reporting the same characteristics. The Census Bureau advises data users to verify table universes are the same before comparing data across census years. For more information about data collection limitations and the impacts on American Samoa's data products, see the 2020 Island Areas Censuses Technical Documentation..Note: Occupation categories are based on 4-digit codes from the Standard Occupational Classification 2018..Explanation of Symbols: 1.An "-" means the statistic could not be computed because there were an insufficient number of observations. 2. An "-" following a median estimate means the median falls in the lowest interval of an open-ended distribution.3. An "+" following a median estimate means the median falls in the upper interval of an open-ended distribution.4. An "N" means data are not displayed for the selected geographic area due to concerns with statistical reliability or an insufficient number of cases.5. An "(X)" means not applicable..Source: U.S. Census Bureau, 2020 Census, American Samoa.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Bowen Deng; Thijs Stuyver (2024). Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture [Dataset]. http://doi.org/10.6084/m9.figshare.27262947.v3
Organization logoOrganization logo

Data from: Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture

Related Article
Explore at:
csvAvailable download formats
Dataset updated
Oct 30, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Bowen Deng; Thijs Stuyver
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This folder contains the formation energy of BDE-db, QM9, PC9, QMugs, and QMugs1.1 datasets by filtering (The training, test, and validation sets were randomly split in a ratio of 0.8, 0.1, and 0.1, respectively). The filtered process is described in the article "Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture" and the code can be found at https://github.com/chimie-paristech-CTM/thermo_GNN.After application of the filter procedure described in the article, final versions of the QM9 (127,007 data points), BDE-db (289,639 data points), PC9 (96,634 data points), QMugs (636,821 data points) and QMugs1.1 (70,546 data points) were obtained and used throughout this study.

Search
Clear search
Close search
Google apps
Main menu