43 datasets found
  1. Data from: Hcropland30: A hybrid 30-m global cropland map by leveraging...

    • zenodo.org
    • data.niaid.nih.gov
    bin, jpeg, zip
    Updated Aug 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu; Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu (2024). Hcropland30: A hybrid 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model [Dataset]. http://doi.org/10.5281/zenodo.13169748
    Explore at:
    zip, bin, jpegAvailable download formats
    Dataset updated
    Aug 3, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu; Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hcropland30:A 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model

    ***Please note this dataset is undergoing peer review***

    Version: 1.0

    Authors: Qiong Hu a, 1, Zhiwen Cai b, 1, Liangzhi You c, d, Steffen Fritz e, Xinyu Zhang c, He Yin f, Haodong Weic, Jingya Yang g, Zexuan Li a, Qiangyi Yu g, Hao Wu a, Baodong Xu b *, Wenbin Wu g, *

    a Key Laboratory for Geographical Process Analysis & Simulation of Hubei Province/College of Urban and Environmental Sciences, Central China Normal University, Wuhan 430079, China

    b College of Resources and Environment, Huazhong Agricultural University, Wuhan 430070, China

    c Macro Agriculture Research Institute, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China

    d International Food Policy Research Institute, 1201 I Street, NW, Washington, DC 20005, USA

    e Novel Data Ecosystems for sustainability Research Group, International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, Laxenburg A-2361, Austria

    f Department of Geography, Kent State University, 325 S. Lincoln Street, Kent, OH 44242, USA

    g State Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, the Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

    Introduction

    We are pleased to introduce a comprehensive global cropland mapping dataset (named Hcropland30) in 2020, meticulously curated to support a wide range of research and analysis applications related to agricultural land and environmental assessment. This dataset encompasses the entire globe, divided into 16,284 grids, each measuring an area of 1°×1°. Hcropland30 was produced by leveraging global land cover products and Landsat data based on a deep learning model. Initially, we established a hierarchal sampling strategy that used the simulated annealing method to identify the representative 1°×1° grids globally and the sparse point-level samples within these selected 1°×1°grids. Subsequently, we employed an ensemble learning technique to expand these sparse point-level samples into the densely pixel-wise labels, creating the area-level 1°×1° cropland labels. These area-level labels were then used to train a U-Net model for predicting global cropland distribution, followed by a comprehensive evaluation of the mapping accuracy.

    Dataset

    1. Hcropland30: A hybrid 30-m global cropland map in 2020

    ****Data format: GeoTiff

    ****Spatial resolution: 30 m

    ****Projection: EPSG: 4326 (WGS84)

    ****Values: 1 denotes cropland and 0 denotes non-cropland

    The dataset has been uploaded in 16,284 tiles. The extent of each tile can be found in the file of “Grids.shp”. Each file is named according to the grid’s Id number. For example, “000015.tif” corresponds to the cropland mapping result for the 15-th 1°×1° grid. This systematic naming convention ensures easy identification and retrieval of the specific grid data.

    2. 1°×1° Grids: This file contains all 16,284 1°×1° grids used in the dataset. The vector file includes 18 attribute fields, providing comprehensive metadata for each grid. These attributes are essential for users who need detailed information about each grid’s characteristics.

    ****Data format: ESRI shapefile

    ****Projection: EPSG: 4326 (WGS84)

    ****Attribute Fields:

    Id: The grid’s ID number.

    area: The area of the grid.

    mode: Indicates the representative sample grid.

    climate: The climate type the grid belongs to.

    dem: Average DEM value of the grid.

    ndvi_s1 to ndvi_s4: Average NDVI values for four seasons within the grid.

    esa, esri, fcs30, fromglc, glad, globeland30: Proportion of cropland pixels of different publicly available cropland products.

    inconsistent: Proportion of inconsistent pixels within the grid according to different public cropland products.

    hcropland30: Proportion of cropland pixels of our Hcropland30 dataset.

    3. Samples: The selected representative pixel-level samples, including 32,343 cropland and 67657 non-cropland samples. The category information of each sample was determined based on visual interpretation on Google Earth image and three-year NDVI time series curves from 2019-2021.

    ****Data format: ESRI shapefile

    ****Projection: EPSG: 4326 (WGS84)

    ****Attribute Fields:

    type: 1 denotes cropland sample and 0 denotes non-cropland sample.

    Citation

    If you use this dataset, please cite the following paper:

    Hu, Q., Cai, Z., You, L., Fritz, S., Zhang, X., Yin, H., Wei, H., Yang, J., Li, Z., Yu, Q., Wu, H., Xu, B., Wu, W. (2024). Hcropland30: A 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model, Remote Sensing of Environment, submitted.

    License

    The data is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

    Disclaimer

    This dataset is provided as-is, without any warranty, express or implied. The dataset author is not

    responsible for any errors or omissions in the data, or for any consequences arising from the use

    of the data.

    Contact

    If you have any questions or feedback regarding the dataset, please contact the dataset author

    Qiong Hu (huqiong@ccnu.edu.cn)

  2. f

    Performance of ORFcor run on simulated inconsistency-containing data in...

    • figshare.com
    xls
    Updated Oct 31, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan L. Klassen; Cameron R. Currie (2016). Performance of ORFcor run on simulated inconsistency-containing data in comparison to known values using the parameters: a = 5; b = 10; d = 0.75 or 0.90; f = 10; g = 30; l = k = 1000. [Dataset]. http://doi.org/10.1371/journal.pone.0058387.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 31, 2016
    Dataset provided by
    PLOS ONE
    Authors
    Jonathan L. Klassen; Cameron R. Currie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance of ORFcor run on simulated inconsistency-containing data in comparison to known values using the parameters: a = 5; b = 10; d = 0.75 or 0.90; f = 10; g = 30; l = k = 1000.

  3. J

    Estimating microcredit impact with low take-up, contamination and...

    • journaldata.zbw.eu
    .rmd, csv
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Florent Bédécarrats; Isabelle Guérin; Solène Morvant-Roux; François Roubaud; Florent Bédécarrats; Isabelle Guérin; Solène Morvant-Roux; François Roubaud (2025). Estimating microcredit impact with low take-up, contamination and inconsistent data: Replication study code and data [Dataset]. http://doi.org/10.15456/iree.2019071.090421
    Explore at:
    csv, .rmdAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Florent Bédécarrats; Isabelle Guérin; Solène Morvant-Roux; François Roubaud; Florent Bédécarrats; Isabelle Guérin; Solène Morvant-Roux; François Roubaud
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We replicate a flagship randomised control trial carried out in rural Morocco that showed substantial and significant impacts of microcredit on the assets, the outputs, the expenses and the profits of self-employment activities. The original results rely primarily on trimming, which is the exclusion of observation with the highest values on some variables. However, the applied trimming procedures are inconsistent between the baseline and the endline. Using identical specifications as the original paper reveals large and significant imbalances at the baseline and, at the endline, impacts on implausible outcomes, like household head gender, language or education. This calls into question the reliability of the data and the integrity of the experiment protocol. We find a series of coding, measurement and sampling errors. Correcting the identified errors lead to different results. After rectifying identified errors, we still find substantial imbalances at baseline and implausible impacts at the endline. Our re-analysis focused on the lack of internal validity of this experiment, but several of the identified issues also raise concerns about its external validity.

  4. Employee Sample Data

    • kaggle.com
    Updated May 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William Lucas (2023). Employee Sample Data [Dataset]. https://www.kaggle.com/datasets/williamlucas0/employee-sample-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    William Lucas
    Description

    An unclean employee dataset can contain various types of errors, inconsistencies, and missing values that affect the accuracy and reliability of the data. Some common issues in unclean datasets include duplicate records, incomplete data, incorrect data types, spelling mistakes, inconsistent formatting, and outliers.

    For example, there might be multiple entries for the same employee with slightly different spellings of their name or job title. Additionally, some rows may have missing data for certain columns such as bonus or exit date, which can make it difficult to analyze trends or make accurate predictions. Inconsistent formatting of data, such as using different date formats or capitalization conventions, can also cause confusion and errors when processing the data.

    Furthermore, there may be outliers in the data, such as employees with extremely high or low salaries or ages, which can distort statistical analyses and lead to inaccurate conclusions.

    Overall, an unclean employee dataset can pose significant challenges for data analysis and decision-making, highlighting the importance of cleaning and preparing data before analyzing it

  5. J

    The inconsistency of common scale estimators when output prices are...

    • journaldata.zbw.eu
    • jda-test.zbw.eu
    .kg, txt
    Updated Dec 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tor Jakob Klette; Zvi Griliches; Tor Jakob Klette; Zvi Griliches (2022). The inconsistency of common scale estimators when output prices are unobserved and endogenous (replication data) [Dataset]. http://doi.org/10.15456/jae.2022313.1255489069
    Explore at:
    txt(4848), .kg(2385), .kg(4104)Available download formats
    Dataset updated
    Dec 8, 2022
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Tor Jakob Klette; Zvi Griliches; Tor Jakob Klette; Zvi Griliches
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This paper explores the inconsistency of common scale estimators when output is proxied by deflated sales, based on a common output deflator across firms. The problem arises when firms operate in an imperfectly competitive environment and prices differ between them. In particular, we show that this problem reveals itself as a downward bias in the scale estimates obtained from production function regressions, under a variety of assumptions about the pattern of technology, demand and factor price shocks. The result also holds for scale estimates obtained from cost functions. The analysis is carried one step further by adding a model of product demand. Within this augmented model we examine the probability limit of the scale estimate obtained from an ordinary production function regression. This analysis reveals that the OLS estimate will be biased towards a value below one, and how this bias is affected by the magnitude of the parameters and the amount of variation in the various shocks. We have included an empirical section which illustrates the issues. The empirical analysis presents a tentative approach to solve the problem discussed in the theoretical part of this paper.

  6. f

    Factors associated with inconsistent condom use among HIV-infected patients...

    • figshare.com
    xls
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gilbert Ndziessi; Sylvie Boyer; Charles Kouanfack; Julien Cohen; Fabienne Marcellin; Jean-Paul Moatti; Eric Delaporte; Bruno Spire; Christian Laurent; Maria Patrizia Carrieri (2023). Factors associated with inconsistent condom use among HIV-infected patients reporting sex with a main or casual partner(s) - either HIV negative or of unknown status during the first year of antiretroviral therapy in Cameroon: univariate and multivariate analyses using mixed-effect logistic models (212 patients, 344 visits). [Dataset]. http://doi.org/10.1371/journal.pone.0036118.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gilbert Ndziessi; Sylvie Boyer; Charles Kouanfack; Julien Cohen; Fabienne Marcellin; Jean-Paul Moatti; Eric Delaporte; Bruno Spire; Christian Laurent; Maria Patrizia Carrieri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OR  =  crude odds ratio, AOR  =  adjusted odds ratio, IQR: interquartile range,*included in multivariate analysis,aduring the previous 12 months,bconsumption of three big bottles and/or six glasses of alcoholic beverages or more on any one occasion,clevel 1 or 2 on a ten-point scale [15],dscore range 0–60, higher values denote more depressive symptoms [17].

  7. d

    Dataset for collaborative prediction of web service quality based on user...

    • search.dataone.org
    • datadryad.org
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang Song (2025). Dataset for collaborative prediction of web service quality based on user preferences and services [Dataset]. http://doi.org/10.5061/dryad.5dv41ns4s
    Explore at:
    Dataset updated
    May 4, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Yang Song
    Time period covered
    Jan 1, 2020
    Description

    The prediction of web service quality plays an important role in improving user services; it has been one of the most popular topics in the field of Internet services. In traditional collaborative filtering methods, differences in the personalization and preferences of different users have been ignored. In this paper, we propose a prediction method for web service quality based on different types of quality of service (QoS) attributes. Different extraction rules are applied to extract the user preference matrices from the original web data, and the negative value filtering-based top-K method is used to merge the optimization results into the collaborative prediction method. Thus, the individualized differences are fully exploited, and the problem of inconsistent QoS values is resolved. The experimental results demonstrate the validity of the proposed method. Compared with other methods, the proposed method performs better, and the results are closer to the real values.

  8. w

    Data from: The problem of inconsistency between thermal maturity indicators...

    • data.wu.ac.at
    pdf
    Updated Jun 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Corp (2018). The problem of inconsistency between thermal maturity indicators used for petroleum exploration in Australian basins [Dataset]. https://data.wu.ac.at/schema/data_gov_au/ZjU0MzM4ZDYtMGFmOS00MzUwLWFjODItNGU5MjY1ZGRhODVh
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 27, 2018
    Dataset provided by
    Corp
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    A major frustration in thermal maturation modelling for petroleum exploration in Australian sedimentary basins is the inconsistency between the values of different thermal maturity indicators. Vitrinite reflectance (VR) , Rock-Eval Tmax , spore colouration index (SCI) and fluorescence alteration of multiple macerals (FAMM) for wells from three Australian basins show inconsistencies due to technical, methodological and conceptual problems inherent in each technique. When the differences between the concepts of rank and thermal maturity are considered, it can be shown that some inconsistencies are more apparent than real. It is important to consider this distinction when selecting data against which to model burial and thermal histories.

  9. Data from: High-Level Quantum Chemistry Reference Heats of Formation for a...

    • acs.figshare.com
    zip
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bun Chan (2023). High-Level Quantum Chemistry Reference Heats of Formation for a Large Set of C, H, N, and O Species in the NIST Chemistry Webbook and the Identification and Validation of Reliable Protocols for Their Rapid Computation [Dataset]. http://doi.org/10.1021/acs.jpca.2c03846.s003
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    ACS Publications
    Authors
    Bun Chan
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    A recent study has examined the accuracy of NIST heats of formation for a set of C, H, and O-containing species with a proposed low-cost quantum chemistry approach. In the present study, we have used high-level methods such as W1X-2 to obtain these data more rigorously, which we have then used to assess the NIST and the previously computed values. We find that many of these NIST data that are as suggested to be unreliable by the previous study are indeed inconsistent with our high-level reference values. However, we also find substantial deviations for the previously computed values from our benchmark. Thus, we have assessed the performance of alternative low-cost methods. In our assessment, we have additionally examined C, H, N, and O-containing species for which heats of formation are available from the NIST database. We find the ωB97M-V/ma-def2-TZVP, DSD-PBEP86/ma-def2-TZVP, and CCSD(T)-F12b/aug′-cc-pVDZ methods to be adequate for obtaining heats of formation with the atomization approach, once their atomic energies are optimized with our benchmark. Notably, the low-cost ωB97M-V method yields values that agree to be within 10 kJ mol–1 for more than 90% of the (∼1500) species. A higher 20 kJ mol–1 threshold captures 98% of the data. The outlier species typically contain many electron-withdrawing (nitro) groups. In these cases, the use of isodesmic-type reactions rather than the atomization approach is more reliable. Our assessment has also identified significant outliers from the NIST database, for which experimental re-determination of the heats of formation would be desirable.

  10. d

    Data from: Weak and inconsistent associations between melanic darkness and...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Oct 23, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siiri-Lii Sandre; Tanel Kaart; Nathan Morehouse; Toomas Tammaru (2018). Weak and inconsistent associations between melanic darkness and fitness related traits in an insect [Dataset]. http://doi.org/10.5061/dryad.kr8vc17
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2018
    Dataset provided by
    Dryad
    Authors
    Siiri-Lii Sandre; Tanel Kaart; Nathan Morehouse; Toomas Tammaru
    Time period covered
    2018
    Area covered
    Estonia
    Description

    Ematurga data for quantitative genetic analysesAn Excel file with three sheets

    Sheet 1: Pedigree data presenting the relatedness structure

    id Individual identification number (including also individuals without phenotype data) sire Sire identification number (zero, if unknown) dam Dam identification number (zero, if unknown)

    Sheets 2 and 3: Heather.data & Bilberry.data: individual-based valued of the traits being analysed

    gen Generation number (1 - F1, 2 - F2) plant Plant (1 - heatrher, 2 - bilberry) sex Sex (1 - male, 2 - female)

    h_rgr & b_rgr Growth ratio in 5th instar on heather and on bilberry, respectively

    h_pupw & b_pupw Pupal weight (mg) on heather and on bilberry, respectively

    h_fifth & b_fifth Duration of the 5th instar (days) on heather and on bilberry, respectively

    h_dscore & b_dscore Melanic darkness MCA score on heather and on bilberry, respectivelydryaddata.xlsx

  11. h

    RADAR

    • huggingface.co
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ken Gu (2025). RADAR [Dataset]. https://huggingface.co/datasets/kenqgu/RADAR
    Explore at:
    Dataset updated
    Jun 13, 2025
    Authors
    Ken Gu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    RADAR: Benchmarking Language Models on Imperfect Tabular Data

      Link: Paper | Code
    

    The Robust And Data Aware Reasoning (RADAR) benchmark is designed to evaluate the ability of language models to demonstrate data-awareness—that is, to recognize, reason over, and appropriately handle complex data artifacts such as:

    Missing data
    Bad values
    Outliers
    Inconsistent formatting
    Inconsistent multi-column logic

    The full dataset includes 53 tasks grounded in real-world… See the full description on the dataset page: https://huggingface.co/datasets/kenqgu/RADAR.

  12. g

    Base SIRENE Toulouse Métropole | gimi9.com

    • gimi9.com
    Updated Apr 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Base SIRENE Toulouse Métropole | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-data-toulouse-metropole-fr-explore-dataset-base-sirene-v3-/
    Explore at:
    Dataset updated
    Apr 14, 2024
    Description

    Find all the companies and their establishments. The Sirene® database is updated daily, with around 30 million establishments operating or not. ### IMPORTANT As the Sirene database contains personal data, INSEE draws your attention to the legal obligations arising from it: * The processing of these data falls under the declaration obligations of Law 78-17 of 6 January 1978 as amended, known as the CNIL Law: * Depending on your use of the dataset, it is your responsibility to take into account the most recent dissemination status of each individual. Article A123-96 of the Commercial Code provides that: Any natural person may request either directly during his creation or modification formalities, or by letter addressed to the Director-General of the National Institute of Statistics and Economic Studies, that the information in the register concerning him or her may not be used by third parties other than the bodies authorised under Article R. 123-224 or administrations, for the purposes of prospecting, in particular commercial. ### SIRENE BY ODS ODS presents a consolidated institution base with data from its associated legal unit. #### ** enrichment** * addition of the wording of the NAF codes and legal categories; * addition of legal tranches and track types; * addition of administrative hierarchies (reg/arr/dep/epci); * addition of the geolocation of establishments via a BAN geocoding; * change of certain abbreviations (F/M, O/N, A/C) by the corresponding wording (sexeunitelegale, etatadministratifunitelegale, characterreemployerunitelegale, administrative establishment, establishment, employer character establishment) * addition of a field “first line of address” with the civility + first name person of the legal unit * addition of an address field establishment (concatenation num + type + channel) #### notes * the “start date of establishment” values were rebuilt from an old version for a hundred records whose values were inconsistent. ### IMPORTANT As the Sirene database contains personal data, INSEE draws your attention to the legal obligations arising from it: * The processing of these data falls under the declaration obligations of Law 78-17 of 6 January 1978 as amended, known as the CNIL Law: * Depending on your use of the dataset, it is your responsibility to take into account the most recent dissemination status of each individual. Article A123-96 of the Commercial Code provides that: Any natural person may request either directly during his creation or modification formalities, or by letter addressed to the Director-General of the National Institute of Statistics and Economic Studies, that the information in the register concerning him or her may not be used by third parties other than the bodies authorised under Article R. 123-224 or administrations, for the purposes of prospecting, in particular commercial. ### SIRENE BY ODS ODS presents a consolidated institution base with data from its associated legal unit. #### ** enrichment** * addition of the wording of the NAF codes and legal categories; * addition of legal tranches and track types; * addition of administrative hierarchies (reg/arr/dep/epci); * addition of the geolocation of establishments via a BAN geocoding; * change of certain abbreviations (F/M, O/N, A/C) by the corresponding wording (sexeunitelegale, etatadministratifunitelegale, characterreemployerunitelegale, administrative establishment, establishment, employer character establishment) * addition of a field “first line of address” with the civility + first name person of the legal unit * addition of an address field establishment (concatenation num + type + channel) #### notes * the “start date of establishment” values were rebuilt from an old version for a hundred records whose values were inconsistent. ### IMPORTANT As the Sirene database contains personal data, INSEE draws your attention to the legal obligations arising from it: * The processing of these data falls under the declaration obligations of Law 78-17 of 6 January 1978 as amended, known as the CNIL Law: * Depending on your use of the dataset, it is your responsibility to take into account the most recent dissemination status of each individual. Article A123-96 of the Commercial Code provides that: Any natural person may request either directly during his creation or modification formalities, or by letter addressed to the Director-General of the National Institute of Statistics and Economic Studies, that the information in the register concerning him or her may not be used by third parties other than the bodies authorised under Article R. 123-224 or administrations, for the purposes of prospecting, in particular commercial. ### SIRENE BY ODS ODS presents a consolidated institution base with data from its associated legal unit. #### ** enrichment** * addition of the wording of the NAF codes and legal categories; * addition of legal tranches and track types; * addition of administrative hierarchies (reg/arr/dep/epci); * addition of the geolocation of establishments via a BAN geocoding; * change of certain abbreviations (F/M, O/N, A/C) by the corresponding wording (sexeunitelegale, etatadministratifunitelegale, characterreemployerunitelegale, administrative establishment, establishment, employer character establishment) * addition of a field “first line of address” with the civility + first name person of the legal unit * addition of an address field establishment (concatenation num + type + channel) #### notes * the “start date of establishment” values were rebuilt from an old version for a hundred records whose values were inconsistent.

  13. f

    Data integration for inference about spatial processes: A model-based...

    • plos.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Tenan; Paolo Pedrini; Natalia Bragalanti; Claudio Groff; Chris Sutherland (2023). Data integration for inference about spatial processes: A model-based approach to test and account for data inconsistency [Dataset]. http://doi.org/10.1371/journal.pone.0185588
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Simone Tenan; Paolo Pedrini; Natalia Bragalanti; Claudio Groff; Chris Sutherland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recently-developed methods that integrate multiple data sources arising from the same ecological processes have typically utilized structured data from well-defined sampling protocols (e.g., capture-recapture and telemetry). Despite this new methodological focus, the value of opportunistic data for improving inference about spatial ecological processes is unclear and, perhaps more importantly, no procedures are available to formally test whether parameter estimates are consistent across data sources and whether they are suitable for integration. Using data collected on the reintroduced brown bear population in the Italian Alps, a population of conservation importance, we combined data from three sources: traditional spatial capture-recapture data, telemetry data, and opportunistic data. We developed a fully integrated spatial capture-recapture (SCR) model that included a model-based test for data consistency to first compare model estimates using different combinations of data, and then, by acknowledging data-type differences, evaluate parameter consistency. We demonstrate that opportunistic data lend itself naturally to integration within the SCR framework and highlight the value of opportunistic data for improving inference about space use and population size. This is particularly relevant in studies of rare or elusive species, where the number of spatial encounters is usually small and where additional observations are of high value. In addition, our results highlight the importance of testing and accounting for inconsistencies in spatial information from structured and unstructured data so as to avoid the risk of spurious or averaged estimates of space use and consequently, of population size. Our work supports the use of a single modeling framework to combine spatially-referenced data while also accounting for parameter consistency.

  14. f

    Comparison of missing values, ‘don’t know’ values and inconsistent values...

    • plos.figshare.com
    xls
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elise Braekman; Finaba Berete; Rana Charafeddine; Stefaan Demarest; Sabine Drieskens; Lydia Gisle; Geert Molenberghs; Jean Tafforeau; Johan Van der Heyden; Guido Van Hal (2023). Comparison of missing values, ‘don’t know’ values and inconsistent values between the paper-and-pencil and web-based mode and number of data entry mistakes in the paper-and-pencil mode (n = 149). [Dataset]. http://doi.org/10.1371/journal.pone.0197434.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Elise Braekman; Finaba Berete; Rana Charafeddine; Stefaan Demarest; Sabine Drieskens; Lydia Gisle; Geert Molenberghs; Jean Tafforeau; Johan Van der Heyden; Guido Van Hal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of missing values, ‘don’t know’ values and inconsistent values between the paper-and-pencil and web-based mode and number of data entry mistakes in the paper-and-pencil mode (n = 149).

  15. undefined undefined: undefined | undefined (undefined)

    • data.census.gov
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau, undefined undefined: undefined | undefined (undefined) [Dataset]. https://data.census.gov/table/ACSDT5YAIAN2021.B99052?q=United%20States%20Diegueno%20(Kumeyaay)&t=Native%20and%20Foreign%20Born&g=050XX00US53077
    Explore at:
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, it is the Census Bureau's Population Estimates Program that produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2017-2021 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Methodological changes to citizenship edits may have affected citizenship data for those born in American Samoa. Users should be aware of these changes when using 2018 data or multi-year data containing data from 2018. For more information, see: American Samoa Citizenship User Note..When information is missing or inconsistent, the Census Bureau logically assigns an acceptable value using the response to a related question or questions. If a logical assignment is not possible, data are filled using a statistical process called allocation, which uses a similar individual or household to provide a donor value. The "Allocated" section is the number of respondents who received an allocated value for a particular subject..The 2017-2021 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.

  16. o

    Data from: Does introspection increase humanitarian concerns in judgment and...

    • openicpsr.org
    delimited, spss
    Updated Jan 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Slovic; Leaf Van Boven; Tehila Kogut; Daniel Vastfjall (2023). Does introspection increase humanitarian concerns in judgment and decision making? [Dataset]. http://doi.org/10.3886/E184065V1
    Explore at:
    spss, delimitedAvailable download formats
    Dataset updated
    Jan 16, 2023
    Dataset provided by
    University of Oregon
    Authors
    Paul Slovic; Leaf Van Boven; Tehila Kogut; Daniel Vastfjall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There is ample evidence that people are inconsistent in the way that they value humanitarian objectives when making decisions about helping persons in distress. Most people believe they should give substantial weight to other people's welfare when making decisions about charitable donations, allocating scarce medical resources, or providing support for refugees. Yet great compassion extended towards individual victims often fades or disappears as the numbers of people in need increase. In some circumstances, emotionally appealing but normatively weak attributes may take precedence over needs. The resulting failures to help others often appear to contradict one's considered beliefs in the importance of giving them assistance. This proposal tests the hypothesis that introspection about personal beliefs regarding how humanitarian concerns should influence behavior will reduce underweighting of these concerns. We hypothesize that introspection will help people make judgments and decisions that better reflect their considered values. We propose a series of studies to test predictions derived from three basic ideas: (a) that people unknowingly weight concerns about others' welfare less than they believe they should weight those concerns, a bias in humanitarian judgment and decision making; (b) that introspection can increase awareness of the discrepancy between people's personal beliefs and their exhibited behavior; and (c) becoming aware of this discrepancy will lead people to reduce the inconsistency between their personal beliefs and behavior by increasing the weighting of others' welfare in their judgments and decisions. Understanding the role of humanitarian values in important personal and policy decisions has broad significance. Millions of lives and national and global security depend on these decisions. Do the political, social, economic, cultural, security, and humanitarian values that we assume should guide our decisions actually exist in some coherent and consistent form? If so, what are these considered values and how do we ensure that our decisions are in accord with these values? This research project aims to make a contribution toward answering these vital questions by examining the degree to which a simple introspection procedure can improve the coherence between one's own values and one's actions in a variety of humanitarian decision contexts. The research yields several intellectual contributions. First, the studies advance understanding of how introspection can improve decision making, in contrast with claims in the research literature that introspection and deliberation can harm decision quality. Second, by examining people's response to introspection about normative decision processes, the studies advance understanding of how people monitor and revise attribute weighting when making decisions. This enables the researchers to differentiate between introspection and deliberation, concepts that are treated similarly in the literature. Third, given that people often have clear beliefs regarding how decisions should be made in specific contexts, the research advances understanding of the extent to which people fail to behave according to those beliefs and how becoming aware of personal humanitarian values can increase the correspondence between those values and actual behavior.

  17. a

    Loudoun Parcels

    • community-loudoungis.opendata.arcgis.com
    • data.virginia.gov
    • +8more
    Updated Apr 13, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loudoun County GIS (2016). Loudoun Parcels [Dataset]. https://community-loudoungis.opendata.arcgis.com/datasets/loudoun-parcels
    Explore at:
    Dataset updated
    Apr 13, 2016
    Dataset authored and provided by
    Loudoun County GIS
    Area covered
    Description

    More MetadataData updated daily.A parcel is a tract or plot of land surveyed and defined by legal ownership. Data were compiled from plats and deeds recorded at the Clerk of the Court and from historic tax maps. Source material was digitized or the coordinates were entered into the database via ARC/INFO Coordinate Geometry (COGO). Digital data from engineering companies has also been incorporated for newer subdivisions. A MCPI number is used to identify each parcel, which is a unique ID number further explained below. Purpose: Parcels are used to support a variety of services including assessment, permitting, subdivision review, planning, zoning, and economic development. Parcel data were initially developed to replace existing tax maps. As a result, there are parcel polygons digitized from tax maps that do not represent land parcels but are taxable entities such as leaseholds or easements. Supplemental Information: Data are stored in the corporate ArcSDE Geodatabase as a feature class. The coordinate system is Virginia State Plane (North), Zone 4501, datum NAD83 HARN. Maintenance and Update Frequency: Parcels are updated on an hourly basis from recorded deeds and plats. Depending on volume and date of receipt of recordation information, data may be updated 2-3 weeks following recordation. Completeness Report: Features may have been eliminated or generalized due to scale and intended use. To assist Loudoun County, Virginia in the maintenance of the data, please provide any information concerning discovered errors, omissions, or other discrepancies found in the data. MCPI: 9 digit unique parcel ID that is a combination of: MAP, CELL, and PARCEL. MAP: 3 digit map number (001-701) corresponding with map tile index. CELL: 2 digit map grid location of parcel center; the grid is comprised of 1000 by 1000 ft grid cells numbered as rows and columns (Columns numbered > 5 6 7 8 9 0; Rows numbered > 1 2 3 4). PARCEL: 4 digit location of polygon center based on the 1927 Virginia State Plane coordinate grid where an easting and northing measurement is taken. example: 6654 from: E 2229668 N475545. The MAP, CELL, and PARCEL values of a parcel do not change when a parcel is altered by a boundary line adjustment or becomes residue from a subdivision. The MAP, CELL, and PARCEL values may therefore be inconsistent with the location of polygon center. MAP, CELL, and PARCEL values have been manually altered for some parcels to agree with other databases; as a result, not all parcels can be located by the MAP, CELL, and PARCEL values. Data Owner: Office of Mapping and Geographic Information

  18. f

    Data from: Misinterpretation of Dubinin–Radushkevich isotherm and its...

    • tandf.figshare.com
    docx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Biswanath Mahanty; Shishir Kumar Behera; Naresh Kumar Sahoo (2023). Misinterpretation of Dubinin–Radushkevich isotherm and its implications on adsorption parameter estimates [Dataset]. http://doi.org/10.6084/m9.figshare.22274661.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Biswanath Mahanty; Shishir Kumar Behera; Naresh Kumar Sahoo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dubinin–Radushkevich (D – R) isotherm remains one of the widely used adsorption models in solid/liquid interface. However, adsorption potential (ε), as a function of equilibrium concentration (Ce) and solubility (Cs) of adsorbate, is frequently expressed through a dimensionally inconsistent form i.e. ε=RTln1+1Ce, instead of the correct form i.e. ε=RTlnCsCe. Although Hu and Zhang (2019) have pointed out the misinterpretation, incorrect use of isotherm however continues. All of the 10 all-time highly cited articles on D-R isotherm and 80 out of 117 articles citing the work of Hu and Zhang (2019) adopted the incorrect or ambiguous form. Only six research articles have referred source of required Cs data, i.e. literature reported value, approximated to initial adsorbate concentration, or as an additional model parameter. Modeling of D-R isotherm using datasets extracted from three selected references suggests that using the incorrect expression would have a variable impact on maximum adsorption capacity estimates. However, sorption energy would invariably be overestimated (at least 200 times) in all data sets. In the absence of any unified approach and the practical difficulty associated with ascertaining the correct Cs value, researchers are inclined to use the inconsistent model.

  19. Data from: LBA-ECO CD-17 Secondary Forest Survey, Para and Rondonia, Brazil:...

    • data.nasa.gov
    • data.globalchange.gov
    • +5more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). LBA-ECO CD-17 Secondary Forest Survey, Para and Rondonia, Brazil: 2002-2003 [Dataset]. https://data.nasa.gov/dataset/lba-eco-cd-17-secondary-forest-survey-para-and-rondonia-brazil-2002-2003-cf66a
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Brazil, State of Rondônia
    Description

    This data set provides measurements for diameter at breast height (DBH), tree height, distance from tree stems to the furthest canopy element, and a species survey of secondary forests in Para and Rondonia, Brazil, from 2002-2003. The forest areas were defined as Type A and Type B stands. Measurements were made in the overstory, understory, and midstory of each stand. Type A stands were sampled intensively, with the goal of providing high-fidelity spatial information about the 3-dimensional structure of the stand. These stands were 60 x 60-m (0.36-ha) areas divided into 10 x 10-m grids of uniform clearing and abandonment history and were identifiable from Landsat images. Type B stands were sampled extensively, with the goal of providing unbiased estimates of biomass, along with some information about the vertical structure of the stand and of spatial variability. These stands were polygons of uniform clearing and afforestation history based on multitemporal Landsat imagery, and varied in size and shape. The Landsat files provide classified land cover for each scene and can be used as a time series to evaluate land cover change over time. Each file is a geolocated land cover map based on 30-m Landsat data. NOTE: There were additional files which could not be archived due to file problems. Data Quality Statement: The Data Center has determined that this data set has missing or incomplete data, metadata, or other documentation resulting in diminished usability of this product. Known Problems: Some unresolved issues remain where data values are inconsistent with the variable descriptions provided with the data set. The site identification and plot identification values are not consistently used in all three data files. The variables are not adequately described.

  20. f

    Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics

    • plos.figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bryan Kolaczkowski; Joseph W. Thornton (2023). Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics [Dataset]. http://doi.org/10.1371/journal.pone.0007891
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Bryan Kolaczkowski; Joseph W. Thornton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias—which is apparent under both controlled simulation conditions and in analyses of empirical sequence data—also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages—that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu; Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu (2024). Hcropland30: A hybrid 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model [Dataset]. http://doi.org/10.5281/zenodo.13169748
Organization logo

Data from: Hcropland30: A hybrid 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model

Related Article
Explore at:
zip, bin, jpegAvailable download formats
Dataset updated
Aug 3, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu; Qiong Hu; Zhiwen Cai; Liangzhi You; Steffen Fritz; Xinyu Zhang; He Yin; Haodong Wei; Jingya Yang; Zexuan Li; Hao Wu; Baodong Xu; Wenbin Wu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Hcropland30:A 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model

***Please note this dataset is undergoing peer review***

Version: 1.0

Authors: Qiong Hu a, 1, Zhiwen Cai b, 1, Liangzhi You c, d, Steffen Fritz e, Xinyu Zhang c, He Yin f, Haodong Weic, Jingya Yang g, Zexuan Li a, Qiangyi Yu g, Hao Wu a, Baodong Xu b *, Wenbin Wu g, *

a Key Laboratory for Geographical Process Analysis & Simulation of Hubei Province/College of Urban and Environmental Sciences, Central China Normal University, Wuhan 430079, China

b College of Resources and Environment, Huazhong Agricultural University, Wuhan 430070, China

c Macro Agriculture Research Institute, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China

d International Food Policy Research Institute, 1201 I Street, NW, Washington, DC 20005, USA

e Novel Data Ecosystems for sustainability Research Group, International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, Laxenburg A-2361, Austria

f Department of Geography, Kent State University, 325 S. Lincoln Street, Kent, OH 44242, USA

g State Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China, the Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

Introduction

We are pleased to introduce a comprehensive global cropland mapping dataset (named Hcropland30) in 2020, meticulously curated to support a wide range of research and analysis applications related to agricultural land and environmental assessment. This dataset encompasses the entire globe, divided into 16,284 grids, each measuring an area of 1°×1°. Hcropland30 was produced by leveraging global land cover products and Landsat data based on a deep learning model. Initially, we established a hierarchal sampling strategy that used the simulated annealing method to identify the representative 1°×1° grids globally and the sparse point-level samples within these selected 1°×1°grids. Subsequently, we employed an ensemble learning technique to expand these sparse point-level samples into the densely pixel-wise labels, creating the area-level 1°×1° cropland labels. These area-level labels were then used to train a U-Net model for predicting global cropland distribution, followed by a comprehensive evaluation of the mapping accuracy.

Dataset

1. Hcropland30: A hybrid 30-m global cropland map in 2020

****Data format: GeoTiff

****Spatial resolution: 30 m

****Projection: EPSG: 4326 (WGS84)

****Values: 1 denotes cropland and 0 denotes non-cropland

The dataset has been uploaded in 16,284 tiles. The extent of each tile can be found in the file of “Grids.shp”. Each file is named according to the grid’s Id number. For example, “000015.tif” corresponds to the cropland mapping result for the 15-th 1°×1° grid. This systematic naming convention ensures easy identification and retrieval of the specific grid data.

2. 1°×1° Grids: This file contains all 16,284 1°×1° grids used in the dataset. The vector file includes 18 attribute fields, providing comprehensive metadata for each grid. These attributes are essential for users who need detailed information about each grid’s characteristics.

****Data format: ESRI shapefile

****Projection: EPSG: 4326 (WGS84)

****Attribute Fields:

Id: The grid’s ID number.

area: The area of the grid.

mode: Indicates the representative sample grid.

climate: The climate type the grid belongs to.

dem: Average DEM value of the grid.

ndvi_s1 to ndvi_s4: Average NDVI values for four seasons within the grid.

esa, esri, fcs30, fromglc, glad, globeland30: Proportion of cropland pixels of different publicly available cropland products.

inconsistent: Proportion of inconsistent pixels within the grid according to different public cropland products.

hcropland30: Proportion of cropland pixels of our Hcropland30 dataset.

3. Samples: The selected representative pixel-level samples, including 32,343 cropland and 67657 non-cropland samples. The category information of each sample was determined based on visual interpretation on Google Earth image and three-year NDVI time series curves from 2019-2021.

****Data format: ESRI shapefile

****Projection: EPSG: 4326 (WGS84)

****Attribute Fields:

type: 1 denotes cropland sample and 0 denotes non-cropland sample.

Citation

If you use this dataset, please cite the following paper:

Hu, Q., Cai, Z., You, L., Fritz, S., Zhang, X., Yin, H., Wei, H., Yang, J., Li, Z., Yu, Q., Wu, H., Xu, B., Wu, W. (2024). Hcropland30: A 30-m global cropland map by leveraging global land cover products and Landsat data based on a deep learning model, Remote Sensing of Environment, submitted.

License

The data is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).

Disclaimer

This dataset is provided as-is, without any warranty, express or implied. The dataset author is not

responsible for any errors or omissions in the data, or for any consequences arising from the use

of the data.

Contact

If you have any questions or feedback regarding the dataset, please contact the dataset author

Qiong Hu (huqiong@ccnu.edu.cn)

Search
Clear search
Close search
Google apps
Main menu