100+ datasets found
  1. H

    FAVOR Essential Database

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin (2022). FAVOR Essential Database [Dataset]. http://doi.org/10.7910/DVN/1VGTJI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.

  2. d

    GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grennan, Mark; Schibel, Martin; Collins, Andrew; Beel, Joeran (2023). GIANT: The 1-Billion Annotated Synthetic Bibliographic-Reference-String Dataset for Deep Citation Parsing [Data] [Dataset]. http://doi.org/10.7910/DVN/LXQXAO
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Grennan, Mark; Schibel, Martin; Collins, Andrew; Beel, Joeran
    Description

    Extracting and parsing reference strings from research articles is a challenging task. State-of-the-art tools like GROBID apply rather simple machine learning models such as conditional random fields (CRF). Recent research has shown a high potential of deep-learning for reference string parsing. The challenge with deep learning is, however, that the training step requires enormous amounts of labeled data – which does not exist for reference string parsing. Creating such a large dataset manually, through human labor, seems hardly feasible. Therefore, we created GIANT. GIANT is a large dataset with 991,411,100 XML labeled reference strings. The strings were automatically created based on 677,000 entries from CrossRef, 1,500 citation styles in the citation-style language, and the citation processor citeproc-js. GIANT can be used to train machine learning models, particularly deep learning models, for citation parsing. While we have not yet tested GIANT for training such models, we hypothesise that the dataset will be able to significantly improve the accuracy of citation parsing. The dataset and code to create it, are freely available at https://github.com/BeelGroup/.

  3. H

    Extracted Data From: TRI Basic Data Plus Files

    • dataverse.harvard.edu
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US EPA (2025). Extracted Data From: TRI Basic Data Plus Files [Dataset]. http://doi.org/10.7910/DVN/PFMTZR
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    US EPA
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2016 - Dec 31, 2023
    Area covered
    United States
    Description

    This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: TRI basic plus data files guides. (2024, September 18). US EPA. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-guides If you have questions about the underlying data stored here, please contact tri.help@epa.gov. If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "EPA has been collecting Toxics Release Inventory (TRI) data since 1987. The "Basic Plus" data files include ten file types that collectively contain all of the data fields from the TRI Reporting Form R and Form A. The files themselves are in tab-delimited .txt format and then compressed into a .zip file. 1a: Facility, chemical, releases and other waste management summary information 1b: Chemical activities and uses 2a: On- and off-site disposal, treatment, energy recovery, and recycling information; non-production-related waste managed quantities; production/activity ratio information; and source reduction activities 2b: Detailed on-site waste treatment methods and efficiency 3a: Transfers off site for disposal and further waste management 3b: Transfers to Publicly Owned Treatment Works (POTWs) (RY1987 - RY2010) 3c: Transfers to Publicly Owned Treatment Works (POTWs) (RY2011 - Present) 4: Facility information 5: Optional information on source reduction, recycling and pollution control (RY2005 - Present) 6: Additional miscellaneous and optional information (RY2010 - Present) Quantities of dioxin and dioxin-like compounds are reported in grams, while all other chemicals are reported in pounds. This webpage contains the most recent versions of all TRI data files; facilities may revise previous years' TRI submissions if necessary, and any such changes will be reflected in these files. For this reason, data contained in these files may differ from data used to construct the TRI National Analysis." [Quote from https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-calendar-years-1987-present]

  4. H

    Reference data: Amherst Massachusetts

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Sep 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sally E Goldin (2020). Reference data: Amherst Massachusetts [Dataset]. http://doi.org/10.7910/DVN/T0ACGB
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 12, 2020
    Dataset provided by
    Harvard Dataverse
    Authors
    Sally E Goldin
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Amherst, Massachusetts
    Description

    Shapefiles of roads, schools, and places of worship for Amherst, MA, extracted from MassGIS database https://www.mass.gov/get-massgis-data EPSG 26986 Used as reference data for testing MapEval

  5. Citation Graph

    • kaggle.com
    zip
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/datasets/harvardlil/citation-graph
    Explore at:
    zip(306688738 bytes)Available download formats
    Dataset updated
    Jun 30, 2020
    Authors
    Caselaw Access Project
    Description

    Context

    The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

    The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

    Learn More: https://case.law/download/citation_graph/

    Access Limits: https://case.law/api/#limits

    Content

    This dataset includes citations and metadata for the CAP citation graph in CSV format.

    Acknowledgements

    The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

    Inspiration

    People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

    Cite Grid is the first visualization we've created based on data from our citation graph.

    Have something to share? We're excited to hear about it.

  6. H

    AReNA’s DHS-GIS Database

    • dataverse.harvard.edu
    Updated Feb 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    International Food Policy Research Institute (IFPRI) (2021). AReNA’s DHS-GIS Database [Dataset]. http://doi.org/10.7910/DVN/OQIPRW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 23, 2021
    Dataset provided by
    Harvard Dataverse
    Authors
    International Food Policy Research Institute (IFPRI)
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/OQIPRWhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/OQIPRW

    Time period covered
    1980 - 2019
    Area covered
    Bangladesh, Benin, Mali, Nepal, Kenya, Nigeria, Rwanda, Myanmar, Burundi, Lesotho
    Dataset funded by
    The Bill & Melinda Gates Foundation
    Description

    Advancing Research on Nutrition and Agriculture (AReNA) is a 6-year, multi-country project in South Asia and sub-Saharan Africa funded by the Bill and Melinda Gates Foundation, being implemented from 2015 through 2020. The objective of AReNA is to close important knowledge gaps on the links between nutrition and agriculture, with a particular focus on conducting policy-relevant research at scale and crowding in more research on this issue by creating data sets and analytical tools that can benefit the broader research community. Much of the research on agriculture and nutrition is hindered by a lack of data, and many of the datasets that do contain both agriculture and nutrition information are often small in size and geographic scope. AReNA team constructed a large multi-level, multi-country dataset combining nutrition and nutrition-relevant information at the individual and household level from the Demographic and Health Surveys (DHS) with a wide variety of geo-referenced data on agricultural production, agroecology, climate, demography, and infrastructure (GIS data). This dataset includes 60 countries, 184 DHS, and 122,473 clusters. Over one thousand geospatial variables are linked with DHS. The entire dataset is organized into 13 individual files: DHS_distance, DHS_livestock, DHS_main, DHS_malaria, DHS NDVI, DHS_nightlight, DHS_pasture and climate (mean), DHS_rainfall, DHS_soil, DHS_SPAM, DHS_suit, DHS_temperature, and DHS_traveltime.

  7. d

    Replication data for: Citations

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lasda Bergman, Elaine (2023). Replication data for: Citations [Dataset]. http://doi.org/10.7910/DVN/27655
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Lasda Bergman, Elaine
    Description

    Microsoft Access Database for bibliometric analysis found in the article: Elaine M. Lasda Bergman, Finding Citations to Social Work Literature: The Relative Benefits of Using Web of Science, Scopus, or Google Scholar, The Journal of Academic Librarianship, Volume 38, Issue 6, November 2012, Pages 370-379, ISSN 0099-1333, http://dx.doi.org/10.1016/j.acalib.2012.08.002. (http://www.sciencedirect.com/science/article/pii/S009913331200119X) Abstract: Past studies of citation coverage of Web of Science, Scopus, and Google Scholar do not demonstrate a consistent pattern that can be applied to the interdisciplinary mix of resources used in social work research. To determine the utility of these tools to social work researchers, an analysis of citing references to well-known social work journals was conducted. Web of Science had the fewest citing references and almost no variety in source format. Scopus provided higher citation counts, but the pattern of coverage was similar to Web of Science. Google Scholar provided substantially more citing references, but only a relatively small percentage of them were unique scholarly journal articles. The patterns of database coverage were replicated when the citations were broken out for each journal separately. The results of this analysis demonstrate the need to determine what resources constitute scholarly research and reflect the need for future researchers to consider the merits of each database before undertaking their research. This study will be of interest to scholars in library and information science as well as social work, as it facilitates a greater understanding of the strengths and limitations of each database and brings to light important considerations for conducting future research. Keywords: Citation analysis; Social work; Scopus; Web of Science; Google Scholar

  8. E

    Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER...

    • portal.edirepository.org
    • search.dataone.org
    Updated Jun 4, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental Data Initiative (2015). Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER collected on 1989-04-05 [Dataset]. http://doi.org/10.6073/pasta/d7d89ef98151702961c6844acd654210
    Explore at:
    Dataset updated
    Jun 4, 2015
    Dataset provided by
    Environmental Data Initiative
    Area covered
    Description

    This LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1989-04-05 (15:02:01.2150380Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 36.56 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301989095PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T05:41:15Z.

  9. d

    Climate Equity Reference Calculator Database, version 7.2.0 (Sep 2018)

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Holz, Christian; Kartha, Sivan; Athanasiou, Tom (2023). Climate Equity Reference Calculator Database, version 7.2.0 (Sep 2018) [Dataset]. http://doi.org/10.7910/DVN/O3H22Z
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Holz, Christian; Kartha, Sivan; Athanasiou, Tom
    Time period covered
    Jan 1, 1850 - Jan 1, 2030
    Description

    This is version 7.2.0 of the core database for the Climate Equity Reference Calculator (calculator.climateequityreference.org).

  10. d

    Examiner Citation Data

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhaven Sampat (2023). Examiner Citation Data [Dataset]. http://doi.org/10.7910/DVN/J0EBWZ
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Bhaven Sampat
    Time period covered
    Jan 1, 2001 - Jan 1, 2010
    Description

    Examiner and other patent citations in U.S. patents issued between 2001 and 2010

  11. e

    Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER...

    • portal.edirepository.org
    bin, jpeg, tiff, txt
    Updated 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EDI (2013). Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER collected on 2003-06-15 [Dataset]. http://doi.org/10.6073/pasta/1776390fcff320725f46e5feae40d464
    Explore at:
    txt(65535 byte), jpeg(746167 byte), jpeg(7102392 byte), tiff(55571338 byte), txt(97191 byte), txt(5873 byte), bin(8776 byte)Available download formats
    Dataset updated
    2013
    Dataset provided by
    EDI
    Time period covered
    Jun 15, 2003
    Area covered
    Variables measured
    pixel
    Description

    This LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 2003-06-15 (15:08:42.1370190Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 40 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130302003166LGS01, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T06:04:03Z.

  12. H

    Political Party Database Round 2 v4 (first public version)

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Susan Scarrow; Paul D. Webb; Thomas Poguntke (2022). Political Party Database Round 2 v4 (first public version) [Dataset]. http://doi.org/10.7910/DVN/0JVUM8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Susan Scarrow; Paul D. Webb; Thomas Poguntke
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Political Party Database (PPDB) is an online public database that is a central source for key information about political party organization, party resources, leadership selection, and partisan political participation in many representative democracies. The files contain the data in SPSS, STATA, and CSV formats. The dataset also includes a PDF with the text responses for the appropriate variables. The PPDB Round 2 dataset complements the Round 1a_1b Dataset. Round 2 data covers 51 countries, reflecting the state of 288 parties in the years 2017-2020.

  13. H

    Replication Data for: Strategic Citation: A Reassessment

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Kuhn; Kenneth Younge; Alan Marco (2023). Replication Data for: Strategic Citation: A Reassessment [Dataset]. http://doi.org/10.7910/DVN/EPZOIM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Jeffrey Kuhn; Kenneth Younge; Alan Marco
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The United States patent system is unique in that it requires applicants to cite documents they know to be relevant to the examination of their patent applications. Lampe (2012) presents evidence that applicants strategically withhold 21-33\% of relevant citations from patent examiners, suggesting that many patents are fraudulently obtained. We challenge this view. We first show that that Lampe's empirical design is inconsistent with both legal standards and standard operating procedures, including how courts identify strategic withholding. We then compile comprehensive data to reassess the empirical basis for Lampe's main claim. We find no evidence that applicants withhold citations.

  14. Gene ontology mapping to DH pahang reference genome v2

    • search.datacite.org
    • dataverse.harvard.edu
    • +1more
    Updated 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathieu Rouard (2018). Gene ontology mapping to DH pahang reference genome v2 [Dataset]. http://doi.org/10.7910/dvn/qbrned
    Explore at:
    Dataset updated
    2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Harvard Dataverse
    Authors
    Mathieu Rouard
    Description

    Gene ontology mapping to DH pahang reference genome v2 using TrEMBL database

  15. d

    Data from: Reference Rot: An Emerging Threat to Transparency in Political...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bullock, John (2023). Reference Rot: An Emerging Threat to Transparency in Political Science [Dataset]. http://doi.org/10.7910/DVN/Q8VDN0
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Bullock, John
    Description

    Transparency of research is a large concern in political science, and the practice of publishing links to datasets and other online resources is one of the main methods by which political scientists promote transparency. But the method cannot work if the links don’t, and very often, they don’t. We show that most of the URLs ever published in the American Political Science Review no longer work as intended. The problem is severe in recent as well as in older articles; for example, more than one-fourth of links published in the APSR in 2013 were broken by the end of 2014. We conclude that “reference rot” limits the transparency and reproducibility of political science research. We also describe practices that scholars can adopt to combat the problem: when possible, they should archive data in trustworthy repositories, use links that incorporate persistent digital identifiers, and create archival versions of the webpages to which they link.

  16. e

    Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER...

    • portal.edirepository.org
    • search.dataone.org
    • +1more
    bin, jpeg, tiff, txt
    Updated 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EDI (2013). Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER collected on 1998-04-14 [Dataset]. http://doi.org/10.6073/pasta/82ca071d73e4fd1c8939bf25980e5d5c
    Explore at:
    tiff(55537028 byte), jpeg(5823634 byte), jpeg(777503 byte), txt(18053 byte), txt(65535 byte), txt(243853 byte), bin(8776 byte)Available download formats
    Dataset updated
    2013
    Dataset provided by
    EDI
    Time period covered
    Apr 14, 1998
    Area covered
    Variables measured
    pixel
    Description

    This LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1998-04-14 (15:09:51.8510190Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 0.02 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301998104PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-21T07:52:54Z.

  17. H

    Global High-Resolution Soil Profile Database for Crop Modeling Applications

    • dataverse.harvard.edu
    • dataone.org
    • +2more
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2025). Global High-Resolution Soil Profile Database for Crop Modeling Applications [Dataset]. http://doi.org/10.7910/DVN/1PEEY0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 18, 2025
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0

    Dataset funded by
    USAID Bureau of Food Security
    CGIAR Research Program on Policies, Institutions, and Markets (PIM)
    Description

    One of the obstacles in applying advanced crop simulation models such as DSSAT at a grid-based platform is the lack of gridded soil input data at various resolutions. Recently, there has been many efforts in scientific communities to develop spatially continuous soil database across the globe. The most representative example is the SoilGrids 1km released by ISRIC in 2014. In addition recent AfSIS project put a lot of efforts to develop more accurate soil database in Africa at high spatial resolution. Taking advantage of those two available high resolution soil databases (SoilGrids 1km and ISRIC-AfSIS at 1km resolution), this project aims to develop a set of DSSAT compatible soil profiles on 5 arc-minute grid (which is HarvestChoice’s standard grid). Six soil properties (bulk density, organic carbon, percentage of clay and silt, soil pH and cation exchange capacity) available from the original SoilGrids 1km or ISRIC-AfSIS were directly used as DSSAT inputs. We applied a pedo-transfer function to derive some soil hydraulic properties (saturated hydraulic conductivity, soil water content at field capacity, wilting point and saturation) which are critical to simulate crop growth. For other required variables, HarvestChoice’s HC27 database are used as a reference. Final outputs are provided in *.SOL file format (DSSAT soil database) for each country at 5-min resolution. In addition, uncertainty maps for organic carbon and soil water content at wilting points at the top 15 cm soil layers were generated to provide brief idea about accuracy of the final products. The generated soil properties were evaluated by visualizing their global maps and by comparing them with IIASA-IFPRI cropland map and AfSIS-GYGA’s available water content maps.

  18. e

    Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER...

    • portal.edirepository.org
    • search.dataone.org
    • +1more
    bin, jpeg, tiff, txt
    Updated 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EDI (2013). Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER collected on 1998-08-20 [Dataset]. http://doi.org/10.6073/pasta/ccdd3aa11f6c2c40200a364114529593
    Explore at:
    tiff(55686328 byte), txt(65535 byte), jpeg(693293 byte), txt(225576 byte), jpeg(4403774 byte), bin(8776 byte), txt(17618 byte)Available download formats
    Dataset updated
    2013
    Dataset provided by
    EDI
    Time period covered
    Aug 20, 1998
    Area covered
    Variables measured
    pixel
    Description

    This LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1998-08-20 (15:11:24.5850630Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 12.6 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301998232PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-21T07:53:53Z.

  19. e

    Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER...

    • portal.edirepository.org
    bin, jpeg, tiff, txt
    Updated 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EDI (2013). Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER collected on 1992-09-20 [Dataset]. http://doi.org/10.6073/pasta/f1c32efc8fcb3d26c9ea73040589b119
    Explore at:
    tiff(55161238 byte), txt(19706 byte), jpeg(5639559 byte), txt(244819 byte), txt(65535 byte), bin(8776 byte), jpeg(732084 byte)Available download formats
    Dataset updated
    2013
    Dataset provided by
    EDI
    Time period covered
    Sep 20, 1992
    Area covered
    Variables measured
    pixel
    Description

    This LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1992-09-20 (14:54:53.5180440Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 0 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301992264XXX02, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T05:49:59Z.

  20. d

    Replication Data for: The Review Process and the Citation Gap: The Role of...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanthak, Kristin; Chris W. Bonneau; Shane M. Redman; Amanda Liefson (2023). Replication Data for: The Review Process and the Citation Gap: The Role of the Editor’s Nudge [Dataset]. http://doi.org/10.7910/DVN/VVV3AN
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Kanthak, Kristin; Chris W. Bonneau; Shane M. Redman; Amanda Liefson
    Description

    This dataset provides the replication data for The Review Process and the Citation Gap: The Role of the Editor’s Nudge

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin (2022). FAVOR Essential Database [Dataset]. http://doi.org/10.7910/DVN/1VGTJI

FAVOR Essential Database

Related Article
Explore at:
10 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2022
Dataset provided by
Harvard Dataverse
Authors
Hufeng Zhou; Theodore Arapoglou; Xihao Li; Zilin Li; Xihong Lin
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.

Search
Clear search
Close search
Google apps
Main menu