Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.
Facebook
TwitterExtracting and parsing reference strings from research articles is a challenging task. State-of-the-art tools like GROBID apply rather simple machine learning models such as conditional random fields (CRF). Recent research has shown a high potential of deep-learning for reference string parsing. The challenge with deep learning is, however, that the training step requires enormous amounts of labeled data – which does not exist for reference string parsing. Creating such a large dataset manually, through human labor, seems hardly feasible. Therefore, we created GIANT. GIANT is a large dataset with 991,411,100 XML labeled reference strings. The strings were automatically created based on 677,000 entries from CrossRef, 1,500 citation styles in the citation-style language, and the citation processor citeproc-js. GIANT can be used to train machine learning models, particularly deep learning models, for citation parsing. While we have not yet tested GIANT for training such models, we hypothesise that the dataset will be able to significantly improve the accuracy of citation parsing. The dataset and code to create it, are freely available at https://github.com/BeelGroup/.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This submission includes publicly available data extracted in its original form. Please reference the Related Publication listed here for source and citation information: TRI basic plus data files guides. (2024, September 18). US EPA. https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-guides If you have questions about the underlying data stored here, please contact tri.help@epa.gov. If you have questions or recommendations related to this metadata entry and extracted data, please contact the CAFE Data Management team at: climatecafe@bu.edu. "EPA has been collecting Toxics Release Inventory (TRI) data since 1987. The "Basic Plus" data files include ten file types that collectively contain all of the data fields from the TRI Reporting Form R and Form A. The files themselves are in tab-delimited .txt format and then compressed into a .zip file. 1a: Facility, chemical, releases and other waste management summary information 1b: Chemical activities and uses 2a: On- and off-site disposal, treatment, energy recovery, and recycling information; non-production-related waste managed quantities; production/activity ratio information; and source reduction activities 2b: Detailed on-site waste treatment methods and efficiency 3a: Transfers off site for disposal and further waste management 3b: Transfers to Publicly Owned Treatment Works (POTWs) (RY1987 - RY2010) 3c: Transfers to Publicly Owned Treatment Works (POTWs) (RY2011 - Present) 4: Facility information 5: Optional information on source reduction, recycling and pollution control (RY2005 - Present) 6: Additional miscellaneous and optional information (RY2010 - Present) Quantities of dioxin and dioxin-like compounds are reported in grams, while all other chemicals are reported in pounds. This webpage contains the most recent versions of all TRI data files; facilities may revise previous years' TRI submissions if necessary, and any such changes will be reflected in these files. For this reason, data contained in these files may differ from data used to construct the TRI National Analysis." [Quote from https://www.epa.gov/toxics-release-inventory-tri-program/tri-basic-plus-data-files-calendar-years-1987-present]
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Shapefiles of roads, schools, and places of worship for Amherst, MA, extracted from MassGIS database https://www.mass.gov/get-massgis-data EPSG 26986 Used as reference data for testing MapEval
Facebook
TwitterThe Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.
The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".
Learn More: https://case.law/download/citation_graph/
Access Limits: https://case.law/api/#limits
This dataset includes citations and metadata for the CAP citation graph in CSV format.
The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.
People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.
Cite Grid is the first visualization we've created based on data from our citation graph.
Have something to share? We're excited to hear about it.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/OQIPRWhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/OQIPRW
Advancing Research on Nutrition and Agriculture (AReNA) is a 6-year, multi-country project in South Asia and sub-Saharan Africa funded by the Bill and Melinda Gates Foundation, being implemented from 2015 through 2020. The objective of AReNA is to close important knowledge gaps on the links between nutrition and agriculture, with a particular focus on conducting policy-relevant research at scale and crowding in more research on this issue by creating data sets and analytical tools that can benefit the broader research community. Much of the research on agriculture and nutrition is hindered by a lack of data, and many of the datasets that do contain both agriculture and nutrition information are often small in size and geographic scope. AReNA team constructed a large multi-level, multi-country dataset combining nutrition and nutrition-relevant information at the individual and household level from the Demographic and Health Surveys (DHS) with a wide variety of geo-referenced data on agricultural production, agroecology, climate, demography, and infrastructure (GIS data). This dataset includes 60 countries, 184 DHS, and 122,473 clusters. Over one thousand geospatial variables are linked with DHS. The entire dataset is organized into 13 individual files: DHS_distance, DHS_livestock, DHS_main, DHS_malaria, DHS NDVI, DHS_nightlight, DHS_pasture and climate (mean), DHS_rainfall, DHS_soil, DHS_SPAM, DHS_suit, DHS_temperature, and DHS_traveltime.
Facebook
TwitterMicrosoft Access Database for bibliometric analysis found in the article: Elaine M. Lasda Bergman, Finding Citations to Social Work Literature: The Relative Benefits of Using Web of Science, Scopus, or Google Scholar, The Journal of Academic Librarianship, Volume 38, Issue 6, November 2012, Pages 370-379, ISSN 0099-1333, http://dx.doi.org/10.1016/j.acalib.2012.08.002. (http://www.sciencedirect.com/science/article/pii/S009913331200119X) Abstract: Past studies of citation coverage of Web of Science, Scopus, and Google Scholar do not demonstrate a consistent pattern that can be applied to the interdisciplinary mix of resources used in social work research. To determine the utility of these tools to social work researchers, an analysis of citing references to well-known social work journals was conducted. Web of Science had the fewest citing references and almost no variety in source format. Scopus provided higher citation counts, but the pattern of coverage was similar to Web of Science. Google Scholar provided substantially more citing references, but only a relatively small percentage of them were unique scholarly journal articles. The patterns of database coverage were replicated when the citations were broken out for each journal separately. The results of this analysis demonstrate the need to determine what resources constitute scholarly research and reflect the need for future researchers to consider the merits of each database before undertaking their research. This study will be of interest to scholars in library and information science as well as social work, as it facilitates a greater understanding of the strengths and limitations of each database and brings to light important considerations for conducting future research. Keywords: Citation analysis; Social work; Scopus; Web of Science; Google Scholar
Facebook
TwitterThis LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1989-04-05 (15:02:01.2150380Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 36.56 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301989095PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T05:41:15Z.
Facebook
TwitterThis is version 7.2.0 of the core database for the Climate Equity Reference Calculator (calculator.climateequityreference.org).
Facebook
TwitterExaminer and other patent citations in U.S. patents issued between 2001 and 2010
Facebook
TwitterThis LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 2003-06-15 (15:08:42.1370190Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 40 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130302003166LGS01, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T06:04:03Z.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Political Party Database (PPDB) is an online public database that is a central source for key information about political party organization, party resources, leadership selection, and partisan political participation in many representative democracies. The files contain the data in SPSS, STATA, and CSV formats. The dataset also includes a PDF with the text responses for the appropriate variables. The PPDB Round 2 dataset complements the Round 1a_1b Dataset. Round 2 data covers 51 countries, reflecting the state of 288 parties in the years 2017-2020.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The United States patent system is unique in that it requires applicants to cite documents they know to be relevant to the examination of their patent applications. Lampe (2012) presents evidence that applicants strategically withhold 21-33\% of relevant citations from patent examiners, suggesting that many patents are fraudulently obtained. We challenge this view. We first show that that Lampe's empirical design is inconsistent with both legal standards and standard operating procedures, including how courts identify strategic withholding. We then compile comprehensive data to reassess the empirical basis for Lampe's main claim. We find no evidence that applicants withhold citations.
Facebook
TwitterGene ontology mapping to DH pahang reference genome v2 using TrEMBL database
Facebook
TwitterTransparency of research is a large concern in political science, and the practice of publishing links to datasets and other online resources is one of the main methods by which political scientists promote transparency. But the method cannot work if the links don’t, and very often, they don’t. We show that most of the URLs ever published in the American Political Science Review no longer work as intended. The problem is severe in recent as well as in older articles; for example, more than one-fourth of links published in the APSR in 2013 were broken by the end of 2014. We conclude that “reference rot” limits the transparency and reproducibility of political science research. We also describe practices that scholars can adopt to combat the problem: when possible, they should archive data in trustworthy repositories, use links that incorporate persistent digital identifiers, and create archival versions of the webpages to which they link.
Facebook
TwitterThis LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1998-04-14 (15:09:51.8510190Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 0.02 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301998104PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-21T07:52:54Z.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0https://dataverse.harvard.edu/api/datasets/:persistentId/versions/2.7/customlicense?persistentId=doi:10.7910/DVN/1PEEY0
One of the obstacles in applying advanced crop simulation models such as DSSAT at a grid-based platform is the lack of gridded soil input data at various resolutions. Recently, there has been many efforts in scientific communities to develop spatially continuous soil database across the globe. The most representative example is the SoilGrids 1km released by ISRIC in 2014. In addition recent AfSIS project put a lot of efforts to develop more accurate soil database in Africa at high spatial resolution. Taking advantage of those two available high resolution soil databases (SoilGrids 1km and ISRIC-AfSIS at 1km resolution), this project aims to develop a set of DSSAT compatible soil profiles on 5 arc-minute grid (which is HarvestChoice’s standard grid). Six soil properties (bulk density, organic carbon, percentage of clay and silt, soil pH and cation exchange capacity) available from the original SoilGrids 1km or ISRIC-AfSIS were directly used as DSSAT inputs. We applied a pedo-transfer function to derive some soil hydraulic properties (saturated hydraulic conductivity, soil water content at field capacity, wilting point and saturation) which are critical to simulate crop growth. For other required variables, HarvestChoice’s HC27 database are used as a reference. Final outputs are provided in *.SOL file format (DSSAT soil database) for each country at 5-min resolution. In addition, uncertainty maps for organic carbon and soil water content at wilting points at the top 15 cm soil layers were generated to provide brief idea about accuracy of the final products. The generated soil properties were evaluated by visualizing their global maps and by comparing them with IIASA-IFPRI cropland map and AfSIS-GYGA’s available water content maps.
Facebook
TwitterThis LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1998-08-20 (15:11:24.5850630Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 12.6 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301998232PAC03, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-21T07:53:53Z.
Facebook
TwitterThis LTER Remote Sensing spatial raster dataset consists of Landsat Enhanced Thematic Mapper image data for Harvard Forest LTER, on 1992-09-20 (14:54:53.5180440Z). Data were collected by Landsat 5, row 30, path 13. Cloud cover was 0 percent. These are reference data from the USGS EROS archive, not data generated by Harvard Forest LTER. This product was created by the U.S. Geological Survey (USGS) and contains Landsat data files in Geographic Tagged Image-File Format (GeoTIFF). NASA Landsat Program, 2009, Landsat TM LT50130301992264XXX02, LPGS_12.0.2, USGS, Sioux Falls, 2012-08-18T05:49:59Z.
Facebook
TwitterThis dataset provides the replication data for The Review Process and the Citation Gap: The Role of the Editor’s Nudge
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Functional Annotation of Variants - Online Resource (FAVOR, https://favor.genohub.org) is a comprehensive whole-genome variant annotation database and a variant browser, providing hundreds of functional annotation scores from a variety of aspects of variant biological function. This FAVOR Essential Database is comprised of a collection of essential annotation scores for all possible SNVs (8,812,917,339) and observed indels (79,997,898) in Build GRCh38/hg38, including variant info, chromosome, position, reference allele, alternative allele, aPC-Conservation, aPC-Epigenetics, aPC-Epigenetics-Active, aPC-Epigenetics-Repressed, aPC-Epigenetics-Transcription, aPC-Local-Nucleotide-Diversity, aPC-Mappability, aPC-Mutation-Density, aPC-Protein-Function, aPC-Proximity-To-TSSTES, aPC-Transcription-Factor, CAGE promoter, CAGE, MetaSVM, rsID, FATHMM-XF, Gencode Comprehensive Category, Gencode Comprehensive Info, Gencode Comprehensive Exonic Category, Gencode Comprehensive Exonic Info, GeneHancer, LINSIGHT, CADD, rDHS. These annotation scores can be integrated into FAVORannotator (https://github.com/zhouhufeng/FAVORannotator) to create an annotated GDS (aGDS) file by storing the genotype data and their functional annotation data in an all-in-one file. The aGDS file can then facilitate a wide range of functionally-informed downstream analyses.