Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.
- This release includes GEO series up to Dec-31, 2020;
- Fixed xlrd missing optional dependency, which affected import of some xls files, previously we were using only openpyxl (thanks to anonymous reviewer);
- All files in supplementary _RAW.tar files were checked for p values, previously _RAW.tar files were completely omitted, alas (thanks to anonymous reviewer).
Archived dataset contains following files:
- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).
- output/document_summaries.csv, document summaries of NCBI GEO series
- output/publications.csv, publication info of NCBI GEO series
- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series
- output/single-cell.csv, single cell experiments
- spots.csv, NCBI SRA sequencing run metadata
- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.
- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Putative target genes identified from rheumatoid arthritis (RA)/osteoarthritis (OA) microarray data.
https://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdfhttps://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdf
ESA maintains an archive of IKONOS Geo Ortho Kit data previously requested through the TPM scheme and acquired between 2000 and 2008, over Europe, North Africa and the Middle East. The imagery products gathered from IKONOS are categorised according to positional accuracy, which is determined by the reliability of an object in the image to be within the specified accuracy of the actual location of the object on the ground. Within each IKONOS-derived product, location error is defined by a circular error at 90% confidence (CE90), which means that locations of objects are represented on the image within the stated accuracy 90% of the time. There are six levels of IKONOS imagery products, determined by the level of positional accuracy: Geo, Standard Ortho, Reference, Pro, Precision and PrecisionPlus. The product provided by ESA to Category-1 users is the Geo Ortho Kit, consisting of IKONOS Black-and-White images with radiometric and geometric corrections (1-metre pixels, CE90=15 metres) bundled with IKONOS multispectral images with absolute radiometry (4-metre pixels, CE90=50 metres). IKONOS collects 1m and 4m Geo Ortho Kit imagery (nominally at nadir 0.82m for panchromatic image, 3.28m for multispectral mode) at an elevation angle between 60 and 90 degrees. To increase the positional accuracy of the final orthorectified imagery, customers should select imagery with IKONOS elevation angle between 72 and 90 degrees. The Geo Ortho Kit is tailored for sophisticated users such as photogrammetrists who want to control the orthorectification process. Geo Ortho Kit images include the camera geometry obtained at the time of image collection. Applying Geo Ortho Kit imagery, customers can produce their own highly accurate orthorectified products by using commercial off the shelf software, digital elevation models (DEMs) and optional ground control. Spatial coverage: Check the spatial coverage of the collection on a map available on the Third Party Missions Dissemination Service.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MicroRNAs (miRNAs) act as epigenetic markers and regulate the expression of their target genes, including those characterized as regulators in autoimmune diseases. Rheumatoid arthritis (RA) is one of the most common autoimmune diseases. The potential roles of miRNA-regulated genes in RA pathogenesis have greatly aroused the interest of clinicians and researchers in recent years. In the current study, RA-related miRNAs records were obtained from PubMed through conditional literature retrieval. After analyzing the selected records, miRNA targeted genes were predicted. We identified 14 RA-associated miRNAs, and their sub-analysis in 5 microarray or RNA sequencing (RNA-seq) datasets was performed. The microarray and RNA-seq data of RA were also downloaded from NCBI Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA), analyzed, and annotated. Using a bioinformatics approach, we identified a series of differentially expressed genes (DEGs) by comparing studies on RA and the controls. The RA-related gene expression profile was thus obtained and the expression of miRNA-regulated genes was analyzed. After functional annotation analysis, we found GO molecular function (MF) terms significantly enriched in calcium ion binding (GO: 0005509). Moreover, some novel dysregulated target genes were identified in RA through integrated analysis of miRNA/mRNA expression. The result revealed that the expression of a number of genes, including ROR2, ABI3BP, SMOC2, etc., was not only affected by dysregulated miRNAs, but also altered in RA. Our findings indicate that there is a close association between negatively correlated mRNA/miRNA pairs and RA. These findings may be applied to identify genetic markers for RA diagnosis and treatment in the future.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the geo-location info of the towns in Madagascar, but lacks town name and population. The data is curated from the Southern African Human-development Information Management Network (SAHIMS) static archive server https://web.archive.org/web/20070808004545/http://www.sahims.net:80/gis/... To view metadata, please visit https://web.archive.org/web/20070705025938/http://www.sahims.net:80/gis/...
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/KTRIJPhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/KTRIJP
Harvard CGA Geotweet IDs Archive is a subset of Harvard CGA Geotweet Archive v2.0 . It contains the user and message identification records of individual tweets for approximately 10 billion geo-tagged tweets from January 2010 to July 2023. This dataset is available to the academic community at large, unlike the Harvard CGA Geotweet Archive v2.0 which is under Twitter's redistribution policy restriction for public sharing. It could serve as cross-validation data for publications that used data from Harvard CGA Geotweet Archive v2.0 . If you are interested in accessing this archive, please fill out our Geotweet Request Form. Before requesting or receiving Tweet IDs, requestors must agree to Twitter's Terms of Service, Twitter's Privacy Policy, and Twitter's Developer Policy . Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA. Citations: If you use the Geotweet Archive in your research please reference it: "Harvard CGA Geotweet IDs Archive". ======================================================== Schema of Geotweet IDs Archive Field name_TYPE_Description message_id----BIGINT----Tweet ID user_id ----BIGINT----User ID number
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the geo-location info, name and type of the health facilities in Zambia. The data is created by Zambia central statistical office, and curated from the Southern African Human-development Information Management Network (SAHIMS) static archive server https://web.archive.org/web/20070322051956/http://www.sahims.net/gis/GIS%20input/GIS_Library_Regional.asp To view metadata, please visit https://web.archive.org/web/20070322051956/http://www.sahims.net/gis/GIS%20input/GIS_Library_Regional.asp
https://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdfhttps://earth.esa.int/eogateway/documents/20142/1560778/ESA-Third-Party-Missions-Terms-and-Conditions.pdf
GeoEye-1 high resolution optical products are available as part of the Maxar Standard Satellite Imagery products from the QuickBird, WorldView-1/-2/-3/-4 and GeoEye-1 satellites. All details about the data provision, data access conditions and quota assignment procedure are described into the Terms of Applicability available in Resources section. In particular, GeoEye-1 offers archive and tasking panchromatic products up to 0.41 m GSD resolution and Multispectral products up to 1.65 m GSD resolution. Band Combination Data Processing Level Resolutions Panchromatic and 4-bands Standard (2A) / View Ready Standard (OR2A) 15 cm HD, 30 cm HD, 30 cm, 40 cm, 50/60 cm View Ready Stereo 30 cm, 40 cm, 50/60 cm Map-Ready (Ortho) 1:12,000 Orthorectified 15 cm HD, 30 cm HD, 30 cm, 40 cm, 50/60 cm The options for 4-Bands are the following: 4-Band Multispectral (BLUE, GREEN, RED, NIR1) 4-Band Pan-sharpened (BLUE, GREEN, RED, NIR1) 4-Band Bundle (PAN, BLUE, GREEN, RED, NIR1) 3-Bands Natural Colour (pan-sharpened BLUE, GREEN, RED) 3-Band Colored Infrared (pan-sharpened GREEN, RED, NIR1). Native 30 cm and 50/60 cm resolution products are processed with MAXAR HD Technology to generate respectively the 15 cm HD and 30 cm HD products the initial special resolution (GSD) is unchanged but the HD technique increases the number of pixels and improves the visual clarity achieving aesthetically refined imagery with precise edges and well-reconstructed details. As per ESA policy, very high-resolution imagery of conflict areas cannot be provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data is aggregated to UK post area from: Geoindex JISC UK Web Domain Dataset. Counts of postcodes are summed by year of archive.org instance and sub-domain e.g. .ac.uk About the Geoindex http://dx.doi.org/10.5259/ukwa.ds.2/geo/1 The ~2.5 billion 200 OK responses in the JISC UK Web Domain Dataset (1996-2010) dataset have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs, crawled at particular times, forms an historical geoindex of the UK web. For more details about how the data was created, its format, and how to use it, see here. The geoindex is composed of some 700,641,549 lines of TSV data, each asserting that a given web page, crawled at a given data, contained one or more references to a given postcode.
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Archiving Software Market size was valued at USD 8 Billion in 2024 and is projected to reach USD 16 Billion by 2031, growing at a CAGR of 10% during the forecast period 2024-2031.
Global Archiving Software Market Drivers
Explosion of Data and Growth in Volume: One of the main factors propelling the archiving software industry is the exponential rise in data generated by enterprises. As digital transformation programs gain momentum, businesses gather enormous volumes of structured and unstructured data. By securely storing and facilitating easy data retrieval, archiving software assists organizations in managing their data effectively—a critical aspect of preserving operational efficiency.
Data governance and regulatory compliance: Regulations of data management and retention are becoming more and more demanding for organizations. Policies like GDPR, HIPAA, and Sarbanes-Oxley require businesses to hold onto specific data for predetermined amounts of time. By automating data retention and destruction, archiving software helps firms stay compliant and helps them avoid the heavy fines that come with non-compliance.
Harvard CGA Geotweet Census Archive is a subset of Harvard CGA Geotweet Archive v2.0 enriched with nationwide census data. It contains the tweet and user identification records along with census variables for more than 2 billion geo-tagged tweets from January 2012 to July 2023. This dataset is available to the academic community at large, unlike the Harvard CGA Geotweet Archive v2.0 which is under Twitter's redistribution policy restriction for public sharing. It could serve as cross-validation data for publications that used data from Harvard CGA Geotweet Archive v2.0 . If you are interested in accessing this archive, please fill out our Geotweet Request Form. Before requesting or receiving Tweet IDs, requestors must agree to Twitter's Terms of Service, Twitter's Privacy Policy, and Twitter's Developer Policy . Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA. Citations: If you use the Geotweet Archive in your research please reference it: "Harvard CGA Geotweet IDs Archive". ======================================================== Schema of Geotweet Census Archive Field name_TYPE_Description message_id----TEXT----Tweet ID user_id ----TEXT----User ID number fips ----FLOAT----County fips code county ----TEXT----County name state ----TEXT----State abbreviation GEOID20 ----FLOAT----Census block geoid
PubMed Central reuse of GEO datasets deposited in 2007This is the raw data behind the analysis. It contains one row for every mention of a 2007 GEO dataset in PubMed Central. Each row identifies the mentioned GEO dataset, the PubMed Central article that mentions the dataset's accession number, whether the authors of the dataset and the attributing article overlap, and whether this is considered an instance of third-party data reuse.PMC_reuse_of_2007_GEO_datasets.csvAggregate Table DataAggregate table data behind the figures and results in the README associated with the main dataset. Includes Baseline metrics used for extrapolating PubMed Central (PMC) results to PubMed, Number of mentions of a 2007 GEO dataset by authors who submitted the dataset, and Number of mentions of a dataset by authors who DID NOT submit the dataset across 2007-2010.tables.csv Funding agencies are reluctant to support data archiving, even though large research funders such as the National Science Foundation (NSF) and the National Institutes of Health acknowledge its importance for scientific progress. Our quantitative estimates of data reuse indicate that ongoing financial investment in data-archiving infrastructure provides a high scientific return.
Harvard CGA Geotweet Sentiment Archive is a subset of Harvard CGA Geotweet Archive v2.0 enriched with a sentiment score. It contains the tweet identification records along with a sentiment score based on tweet text for about 4.3 billion geo-tagged tweets since 2019. This sentiment score was calculated using Bidirectional Encoder Representations from Transformers. More information about this methodology can be found in our Nature Paper on Twitter Sentiment Geographical Index. This dataset is available to the academic community at large, unlike the Harvard CGA Geotweet Archive v2.0 which is under Twitter's redistribution policy restriction for public sharing. It could serve as cross-validation data for publications that used data from Harvard CGA Geotweet Archive v2.0 . If you are interested in accessing this archive, please fill out our Geotweet Request Form. Before requesting or receiving Tweet IDs, requestors must agree to Twitter's Terms of Service, Twitter's Privacy Policy, and Twitter's Developer Policy . Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA. Citations: If you use the Geotweet Archive in your research please reference it: "Harvard CGA Geotweet IDs Archive". ======================================================== Schema of Geotweet Census Archive Field name_TYPE_Description message_id----TEXT----Tweet ID score ----FLOAT----BERT sentiment score
This dataset tracks the updates made on the dataset "Medicaid Opioid Prescribing Rates - by Geography" as a repository for previous versions of the data and metadata.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2016-2020 American Community Survey (ACS) population estimates are included to calculate the cumulative rate per 10,000 residents.
Dataset covers cases going back to 3/2/2020 when testing began. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily.
Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas
B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date.
The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date).
COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated.
C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 05:00 Pacific Time.
D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset can be used to track the spread of COVID-19 throughout the city, in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date.
Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. Cases are dropped altogether for areas where acs_population < 1000 4. Deaths data are not included in this dataset for privacy reasons. The low COVID-19 death rate in San Francisco, along with other publicly available information on deaths, means that deaths data by geography and day is too granular and potentially risky. Read more in our privacy guidelines
Rate suppression in effect where counts lower than 20 Rates are not calculated unless the cumulative case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.
Rows included for Citywide case counts Rows are included for the Citywide case counts and incidence rate every day. These Citywide rows can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling bases.
Related dataset See the dataset of the most recent cumulative counts for all geographic areas here: https://data.sfgov.org/COVID-19/COVID-19-Cases-and-Deaths-Summarized-by-Geography/tpyr-dvnc
E. CHANGE LOG
This dataset tracks the updates made on the dataset "Medicare Inpatient Hospitals - by Geography and Service" as a repository for previous versions of the data and metadata.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is a geo-referenced data collection (database) with the released geoscientific drilling and profile data from the geological layer directories. The data encryption is based on the “symbol key geology (SEP1 of 1991)” of the German State Geological Services. The data will be presented in the Hamburg drilling data portal as far as a release is available. For free machine processing in accordance with the Hamburg Transparency Act, data in the format gml is provided in a rar archive. It contains the master data of all holes. Due to their size, the layer data is divided into 7 files according to the districts of Hamburg and for data protection reasons do not contain the data of private drilling for which no release is available for publication. The download files are updated as required. For a better understanding of the data, a file with field descriptions and key lists is also provided. The data will also be provided by the WFS services ¿WFS BoreholeML 3.0 Header and ¿WFS BoreholeML 3.0 for the Drilling Point Map Germany, as far as release is available. The data are not available here in the original SEP1 but in the undifferentiated BoreholeML3 format. These two services provide complex GML schemas that cannot easily be processed by standard GIS clients such as ArcMap or QGis.
In June of 1990 and July of 1991, the U.S. Geological Survey (USGS) conducted geophysical surveys to investigate the shallow geologic framework of the Mississippi-Alabama-Florida shelf in the northern Gulf of Mexico, from Mississippi Sound to the Florida Panhandle. Work was done onboard the Mississippi Mineral Resources Institute R/V Kit Jones as part of a project to study coastal erosion and offshore sand resources. This report is part of a series to digitally archive the legacy analog data collected from the Mississippi-Alabama SHelf (MASH). The MASH data rescue project is a cooperative effort by the USGS and the Minerals Management Service (MMS). This report serves as an archive of high-resolution scanned Tagged Image File Format (TIFF) and Graphics Interchange Format (GIF) images of the original boomer paper records, navigation files, trackline maps, Geographic Information System (GIS) files, cruise logs, and formal Federal Geographic Data Committee (FGDC) metadata.
https://ega-archive.org/dacs/EGAC00001000000https://ega-archive.org/dacs/EGAC00001000000
Geographic variation of mutagenic exposures in kidney cancer genomes – copy number variants (Mutographs)
https://data.gov.tw/licensehttps://data.gov.tw/license
Provide national archives landmark geographical distribution data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We analysed the field of expression profiling by high throughput sequencing, or HT-seq, in terms of replicability and reproducibility, using data from the NCBI GEO (Gene Expression Omnibus) repository.
- This release includes GEO series up to Dec-31, 2020;
- Fixed xlrd missing optional dependency, which affected import of some xls files, previously we were using only openpyxl (thanks to anonymous reviewer);
- All files in supplementary _RAW.tar files were checked for p values, previously _RAW.tar files were completely omitted, alas (thanks to anonymous reviewer).
Archived dataset contains following files:
- output/parsed_suppfiles.csv, p-value histograms, histogram classes, estimated number of true null hypotheses (pi0).
- output/document_summaries.csv, document summaries of NCBI GEO series
- output/publications.csv, publication info of NCBI GEO series
- output/scopus_citedbycount.csv, Scopus citation info of NCBI GEO series
- output/single-cell.csv, single cell experiments
- spots.csv, NCBI SRA sequencing run metadata
- suppfilenames.txt, list of all supplementary file names of NCBI GEO submissions. One filename per row.
- suppfilenames_filtered.txt, list of supplementary file names used for downloading files from NCBI GEO. One filename per row.