100+ datasets found

Data from: Classification of Mars Terrain Using Multiple Data Sources
data.nasa.gov
datasets.ai
+2more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.
u
ICOADS Input Data Sources
rda.ucar.edu
rda-web-prod.ucar.edu
+2more
Updated Feb 15, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2007). ICOADS Input Data Sources [Dataset]. https://rda.ucar.edu/cgi-bin/lookfordata?b=var&v=SWELLS
Explore at:
Dataset updated
Feb 15, 2007
Description
This dataset contains auxiliary, preliminary, and other datasets that are in preparation to be included in a future ICOADS release. Data are provided either in IMMA1 or native (non-IMMA1) ... format. It also contains datasets in IMMA1 and non-IMMA1 formats that have transitioned into ICOADS. This dataset was created in 2018 in conjunction with the completion of Release 3.0 and efforts going forward - it is not a complete collection of inputs for ICOADS beginning with Release 1. The purpose of this dataset is to provide a common archive point for data exchange with ICOADS researchers and track the provenance as input data sources are added to official releases. These sources are not recommended for general public use. If source data are archived in a different independent RDA dataset, those data are not duplicated in this dataset, but will be referenced with a "Related RDA Dataset" link, e.g. DS285.0 is the World Ocean Database in a non-IMMA1 format provided by NCEI.
Z
Dataset for "Open access books through open data sources: Assessing...
data.niaid.nih.gov
Updated Nov 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikael Laakso (2022). Dataset for "Open access books through open data sources: Assessing prevalence, providers, and preservation" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7305476
Explore at:
Dataset updated
Nov 9, 2022
Dataset authored and provided by
Mikael Laakso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the raw collected data reported on in the manuscript titled "Open access books through open data sources: Assessing prevalence, providers, and preservation" which is available here: https://doi.org/10.5281/zenodo.7305490

One file contains the results of the digital object identifier queries, and the other data on which publication records were found to be included in which of the studied bibliometric data sources, and preservation services.

The author is grateful to Alicia Wise and Ronald Snijder for assisting in the identification of available datasets and valuable feedback throughout the study.

This research was commissioned by CLOCKSS, DOAB, and OAPEN.
f
Data sources.
figshare.com
xls
Updated Aug 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohammad Shahbaz; Jane E. Harding; Barry Milne; Anthony Walters; Martin von Randow; Greg D. Gamble (2024). Data sources. [Dataset]. http://doi.org/10.1371/journal.pone.0308414.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0308414.t001
Dataset updated
Aug 7, 2024
Dataset provided by
PLOS ONE
Authors
Mohammad Shahbaz; Jane E. Harding; Barry Milne; Anthony Walters; Martin von Randow; Greg D. Gamble
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionA combination of self-reported questionnaire and administrative data could potentially enhance ascertainment of outcomes and alleviate the limitations of both in follow up studies. However, it is uncertain how access to only one of these data sources to assess outcomes impact study findings. Therefore, this study aimed to determine whether the study findings would be altered if the outcomes were assessed by different data sources alone or in combination.MethodsAt 50-year follow-up of participants in a randomized trial, we assessed the effect of antenatal betamethasone exposure on the diagnosis of diabetes, pre-diabetes, hyperlipidemia, hypertension, mental health disorders, and asthma using a self-reported questionnaire, administrative data, a combination of both, or any data source, with or without adjudication by an expert panel of five clinicians. Differences between relative risks derived from each data source were calculated using the Bland-Altman approach.ResultsThere were 424 participants (46% of those eligible, aged 49 years, SD 1, 50% male). There were no differences in study outcomes between participants exposed to betamethasone and those exposed to placebo when the outcomes were assessed using different data sources. When compared to the study findings determined using adjudicated outcomes, the mean difference (limits of agreement) in relative risks derived from other data sources were: self-reported questionnaires 0.02 (-0.35 to 0.40), administrative data 0.06 (-0.32 to 0.44), both questionnaire and administrative data 0.01 (-0.41 to 0.43), and any data source, 0.01 (-0.08 to 0.10).ConclusionUtilizing a self-reported questionnaire, administrative data, both questionnaire and administrative data, or any of these sources for assessing study outcomes had no impact on the study findings compared with when study outcomes were assessed using adjudicated outcomes.
d
Transportation Projects in Your Neighborhood
catalog.data.gov
datasets.ai
+3more
Updated Jul 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of New York (2025). Transportation Projects in Your Neighborhood [Dataset]. https://catalog.data.gov/dataset/transportation-projects-in-your-neighborhood
Explore at:
Dataset updated
Jul 19, 2025
Dataset provided by
State of New York
Description
This data set contains DOT construction project information. The data is refreshed nightly from multiple data sources, therefore the data becomes stale rather quickly.
w
State of California - Data
data.wu.ac.at
Updated Oct 11, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Global (2013). State of California - Data [Dataset]. https://data.wu.ac.at/odso/datahub_io/NDZlMmFjNWEtMGY1ZS00ZWVhLTgzZWEtMmY5ZmFhMGQyMjEx
Explore at:
Dataset updated
Oct 11, 2013
Dataset provided by
Global
Description
About

Data from the State of California. From website:

Access raw State data files, databases, geographic data, and other data sources. Raw State data files can be reused by citizens and organizations for their own web applications and mashups.

Openness

Open. Effectively in the public domain. Terms of use page says:

In general, information presented on this web site, unless otherwise indicated, is considered in the public domain. It may be distributed or copied as permitted by law. However, the State does make use of copyrighted data (e.g., photographs) which may require additional permissions prior to your use. In order to use any information on this web site not owned or created by the State, you must seek permission directly from the owning (or holding) sources. The State shall have the unlimited right to use for any purpose, free of any charge, all information submitted via this site except those submissions made under separate legal contract. The State shall be free to use, for any purpose, any ideas, concepts, or techniques contained in information provided through this site.
Z
Data from: PANACEA dataset - Heterogeneous COVID-19 Claims
data.niaid.nih.gov
zenodo.org
Updated Jul 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Procter, Rob (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
Explore at:
Dataset updated
Jul 15, 2022
Dataset provided by
Arana-Catania, Miguel
Zubiaga, Arkaitz
Liakata, Maria
He, Yulan
Kochkina, Elena
Procter, Rob
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

The data sources used are:

The CoronaVirusFacts/DatosCoronaVirus Alliance Database. https://www.poynter.org/ifcn-covid-19-misinformation/

CoAID dataset (Cui and Lee, 2020) https://github.com/cuilimeng/CoAID

MM-COVID (Li et al., 2020) https://github.com/bigheiniu/MM-COVID

CovidLies (Hossain et al., 2020) https://github.com/ucinlp/covid19-data

TREC Health Misinformation track https://trec-health-misinfo.github.io/

TREC COVID challenge (Voorhees et al., 2021; Roberts et al., 2020) https://ir.nist.gov/covidSubmit/data.html

The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

The entries in the dataset contain the following information:

Claim. Text of the claim.

Claim label. The labels are: False, and True.

Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

Original information source. Information about which general information source was used to obtain the claim.

Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

References

Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
d
Tableau Data Visualization for OCHCO Data
catalog.data.gov
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MGMT (2025). Tableau Data Visualization for OCHCO Data [Dataset]. https://catalog.data.gov/dataset/tableau-data-visualization-for-ochco-data
Explore at:
Dataset updated
Jun 7, 2025
Dataset provided by
MGMT
Description
Tableau is a cloud-enabled, Business Intelligence platform to support HR analysis, ad hoc reporting, custom reports and dashboards, providing strategic and analytical awareness through OCHCO Data. Tableau pulls data from different data sources and the results are shared with our customers and partners for data driven decision making.
d
Addresses (Open Data)
catalog.data.gov
data.tempe.gov
+10more
Updated Jul 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
Explore at:
Dataset updated
Jul 19, 2025
Dataset provided by
City of Tempe
Description
This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary
A
2018 Response to Resistance Subjects Data
data.amerigeoss.org
splitgraph.com
csv, json, rdf, xml
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States (2022). 2018 Response to Resistance Subjects Data [Dataset]. https://data.amerigeoss.org/dataset/2018-response-to-resistance-subjects-data
Explore at:
rdf, xml, json, csvAvailable download formats
Dataset updated
Apr 13, 2022
Dataset provided by
United States
Description
A dataset of APD Response to Resistance subjects which occurred in 2018. AUSTIN POLICE DEPARTMENT DATA DISCLAIMER 1. The data provided is for informational use only and is not considered official APD crime data as in official Texas DPS or FBI crime reports. 2. APD’s crime database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. 3. The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.

The number of use of force subjects in the city of Austin for 2018. This dataset is used to provide additional insight visualizations on use of force subjects in the city of Austin in 2018.

This dataset supports measure(s) S.D.3 of SD23. https://data.austintexas.gov/stories/s/kx2d-jya7

Data Source: Versadex

Calculation: (S.D.3) N/A

Measure Time Period: Annually (Calendar Year)

Automated: no

Date of last description update: 8/10/2020
D
Data Preparation Tools and Software Market Report | Global Forecast From...
dataintelo.com
csv, pdf, pptx
Updated Sep 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Data Preparation Tools and Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-data-preparation-tools-and-software-market
Explore at:
csv, pdf, pptxAvailable download formats
Dataset updated
Sep 12, 2024
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Data Preparation Tools and Software Market Outlook

The global data preparation tools and software market size was valued at USD 3.5 billion in 2023 and is projected to reach USD 11.2 billion by 2032, growing at a compound annual growth rate (CAGR) of 13.6% during the forecast period. This impressive growth can be attributed to the increasing need for data-driven decision-making, the rising adoption of big data analytics, and the growing importance of business intelligence across various industries.

One of the key growth factors driving the data preparation tools and software market is the exponential increase in data volume generated by both enterprises and consumers. With the proliferation of IoT devices, social media, and digital transactions, organizations are inundated with vast amounts of data that need to be processed and analyzed efficiently. Data preparation tools help in cleaning, transforming, and structuring this raw data, making it usable for analytics and business intelligence, thereby enabling companies to derive actionable insights and maintain a competitive edge.

Another significant driver for the market is the rising complexity of data sources and types. Organizations today deal with diverse datasets coming from various sources such as relational databases, cloud storage, APIs, and even machine-generated data. Data preparation tools and software provide automated and scalable solutions to handle these complex datasets, ensuring data consistency and accuracy. The tools also facilitate seamless integration with various data sources, enabling organizations to create a unified view of their data landscape, which is crucial for effective decision-making.

The growing adoption of advanced technologies such as AI and machine learning is also boosting the demand for data preparation tools and software. These technologies require high-quality, well-prepared data to function efficiently and generate reliable outcomes. Data preparation tools that incorporate AI capabilities can automate many of the repetitive and time-consuming tasks involved in data cleaning and transformation, thereby improving productivity and reducing human error. This, in turn, accelerates the implementation of AI-driven solutions across different sectors, further propelling market growth.

Regionally, North America currently holds the largest share of the data preparation tools and software market, driven by the presence of leading technology companies and a robust infrastructure for data analytics and business intelligence. However, the Asia Pacific region is expected to witness the highest growth rate during the forecast period, fueled by rapid digitization, increasing adoption of cloud-based solutions, and significant investments in big data and AI technologies. Europe is also a key market, with growing awareness about data governance and privacy regulations driving the adoption of data preparation tools.

Component Analysis

When analyzing the data preparation tools and software market by component, it is broadly categorized into software and services. The software segment is further divided into standalone data preparation tools and integrated solutions that come as part of larger analytics or business intelligence platforms. Standalone data preparation tools offer specialized functionalities such as data cleaning, transformation, and enrichment, catering to specific data preparation needs. These tools are particularly popular among organizations that require high levels of customization and flexibility in their data preparation processes.

On the other hand, integrated solutions are gaining traction due to their ability to provide end-to-end capabilities, from data preparation to visualization and analytics, all within a single platform. These solutions typically offer seamless integration with other business intelligence tools, enabling users to move from data preparation to analysis without switching between different software. This integrated approach is particularly beneficial for enterprises looking to streamline their data workflows and improve operational efficiency.

The services segment includes professional services such as consulting, implementation, and training, as well as managed services. Professional services are crucial for organizations that lack in-house expertise in data preparation and need external assistance to set up and optimize their data preparation processes. These services help organizations effectively leverage data preparation tools, ensuring that they achieve maximum ROI. Managed services, on the other hand, are
BSVerticalOzone database
zenodo.org
data.niaid.nih.gov
+1more
nc
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Birgit Hassler; Birgit Hassler; Stefanie Kremser; Stefanie Kremser; Greg Bodeker; Greg Bodeker; Jared Lewis; Jared Lewis; Kage Nesbit; Sean Davis; Sandip Dhomse; Sandip Dhomse; Martin Dameris; Martin Dameris; Kage Nesbit; Sean Davis (2020). BSVerticalOzone database [Dataset]. http://doi.org/10.5281/zenodo.1217184
Explore at:
ncAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1217184
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Birgit Hassler; Birgit Hassler; Stefanie Kremser; Stefanie Kremser; Greg Bodeker; Greg Bodeker; Jared Lewis; Jared Lewis; Kage Nesbit; Sean Davis; Sandip Dhomse; Sandip Dhomse; Martin Dameris; Martin Dameris; Kage Nesbit; Sean Davis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An updated and improved version of a global, vertically resolved, monthly mean zonal mean ozone database has been calculated – hereafter referred to as the BSVertOzone database, the BSVertOzone database. Like its predecessor, it combines measurements from several satellite-based instruments and ozone profile measurements from the global ozonesonde network. Monthly mean zonal mean ozone concentrations in mixing ratio and number density are provided in 5 latitude zones, spanning 70 altitude levels (1 to 70km), or 70 pressure 5 levels that are approximately 1km apart (878.4hPa to 0.046hPa). Different data sets or "Tiers" are provided: "Tier 0" is based only on the available measurements and therefore does not completely cover the whole globe or the full vertical range uniformly; the "Tier 0.5" monthly mean zonal means are calculated from a filled version of the Tier 0 database where missing monthly mean zonal mean values are estimated from correlations at level 20 against a total column ozone database and then at levels above and below on correlations with lower and upper levels respectively. The Tier 10 0.5 database includes the full range of measurement variability and is created as an intermediate step for the calculation of the "Tier 1" data where a least squares regression model is used to attribute variability to various known forcing factors for ozone. Regression model fit coefficients are expanded in Fourier series and Legendre polynomials (to account for seasonality and latitudinal structure, respectively). Four different combinations of contributions from selected regression model basis functions result in four different "Tier 1" data set that can be used for comparisons with chemistry-climate model simulations that do not 15 exhibit the same unforced variability as reality (unless they are nudged towards reanalyses). Compared to previous versions of the database, this update includes additional satellite data sources and ozonesonde measurements to extend the database period to 2016. Additional improvements over the previous version of the database include: (i) Adjustments of measurements to account for biases and drifts between different data sources (using a chemistry-transport model simulation as a transfer standard), (ii) a more objective way to determine the optimum number of Fourier and Legendre expansions for the basis 20 function fit coefficients, and (iii) the derivation of methodological and measurement uncertainties on each database value are traced through all data modification steps. Comparisons with the ozone database from SWOOSH (Stratospheric Water and OzOne Satellite Homogenized data set) show excellent agreements in many regions of the globe, and minor differences caused by different bias adjustment procedures for the two databases. However, compared to SWOOSH, BSVertOzone additionally covers the troposphere.
u
Data from: GALLO: An R package for Genomic Annotation and integration of...
portalcientifico.unileon.es
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela (2020). GALLO: An R package for Genomic Annotation and integration of multiple data source in livestock for positional candidate LOci [Dataset]. https://portalcientifico.unileon.es/documentos/668fc461b9e7c03b01bdb93f
Explore at:
Dataset updated
2020
Authors
Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela
Description
The development of high-throughput sequencing and genotyping methodologies allowed the identification of thousands of genomic regions associated with several complex traits. The integration of multiple sources of biological information is a crucial step required to better understand patterns regulating the development of these traits. Genomic Annotation in Livestock for positional candidate LOci (GALLO) is an R package developed for the accurate annotation of genes and quantitative trait loci (QTLs) located in regions identified in common genomic analyses performed in livestock, such as Genome-Wide Association Studies and transcriptomics using RNA-Sequencing. Moreover, GALLO allows the graphical visualization of gene and QTL annotation results, data comparison among different grouping factors (e.g., methods, breeds, tissues, statistical models, studies, etc.), and QTL enrichment in different livestock species including cattle, pigs, sheep, and chickens, etc. Consequently, GALLO is a useful package for the annotation, identification of hidden patterns across datasets, datamining previously reported associations, as well as the efficient scrutinization of the genetic architecture of complex traits in livestock.
o
Data from: A consensus compound/bioactivity dataset for data-driven drug...
explore.openaire.eu
data.niaid.nih.gov
+1more
Updated Mar 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laura Isigkeit; Apirat Chaikuad; Daniel Merk (2022). A consensus compound/bioactivity dataset for data-driven drug design and chemogenomics [Dataset]. http://doi.org/10.5281/zenodo.6398019
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6398019
Dataset updated
Mar 2, 2022
Authors
Laura Isigkeit; Apirat Chaikuad; Daniel Merk
Description
This is the updated version of the dataset from 10.5281/zenodo.6320761 Information The diverse publicly available compound/bioactivity databases constitute a key resource for data-driven applications in chemogenomics and drug design. Analysis of their coverage of compound entries and biological targets revealed considerable differences, however, suggesting benefit of a consensus dataset. Therefore, we have combined and curated information from five esteemed databases (ChEMBL, PubChem, BindingDB, IUPHAR/BPS and Probes&Drugs) to assemble a consensus compound/bioactivity dataset comprising 1144648 compounds with 10915362 bioactivities on 5613 targets (including defined macromolecular targets as well as cell-lines and phenotypic readouts). It also provides simplified information on assay types underlying the bioactivity data and on bioactivity confidence by comparing data from different sources. We have unified the source databases, brought them into a common format and combined them, enabling an ease for generic uses in multiple applications such as chemogenomics and data-driven drug design. The consensus dataset provides increased target coverage and contains a higher number of molecules compared to the source databases which is also evident from a larger number of scaffolds. These features render the consensus dataset a valuable tool for machine learning and other data-driven applications in (de novo) drug design and bioactivity prediction. The increased chemical and bioactivity coverage of the consensus dataset may improve robustness of such models compared to the single source databases. In addition, semi-automated structure and bioactivity annotation checks with flags for divergent data from different sources may help data selection and further accurate curation. This dataset belongs to the publication: https://doi.org/10.3390/molecules27082513 Structure and content of the dataset Dataset structure ChEMBL ID PubChem ID IUPHAR ID Target Activity type Assay type Unit Mean C (0) ... Mean PC (0) ... Mean B (0) ... Mean I (0) ... Mean PD (0) ... Activity check annotation Ligand names Canonical SMILES C ... Structure check (Tanimoto) Source The dataset was created using the Konstanz Information Miner (KNIME) (https://www.knime.com/) and was exported as a CSV-file and a compressed CSV-file. Except for the canonical SMILES columns, all columns are filled with the datatype ‘string’. The datatype for the canonical SMILES columns is the smiles-format. We recommend the File Reader node for using the dataset in KNIME. With the help of this node the data types of the columns can be adjusted exactly. In addition, only this node can read the compressed format. Column content: ChEMBL ID, PubChem ID, IUPHAR ID: chemical identifier of the databases Target: biological target of the molecule expressed as the HGNC gene symbol Activity type: for example, pIC50 Assay type: Simplification/Classification of the assay into cell-free, cellular, functional and unspecified Unit: unit of bioactivity measurement Mean columns of the databases: mean of bioactivity values or activity comments denoted with the frequency of their occurrence in the database, e.g. Mean C = 7.5 *(15) -> the value for this compound-target pair occurs 15 times in ChEMBL database Activity check annotation: a bioactivity check was performed by comparing values from the different sources and adding an activity check annotation to provide automated activity validation for additional confidence no comment: bioactivity values are within one log unit; check activity data: bioactivity values are not within one log unit; only one data point: only one value was available, no comparison and no range calculated; no activity value: no precise numeric activity value was available; no log-value could be calculated: no negative decadic logarithm could be calculated, e.g., because the reported unit was not a compound concentration Ligand names: all unique names contained in the five source databases are listed Canonical SMILES columns: Molecular structure of the compound from each database Structure check (Tanimoto): To denote matching or differing compound structures in different source databases match: molecule structures are the same between different sources; no match: the structures differ. We calculated the Jaccard-Tanimoto similarity coefficient from Morgan Fingerprints to reveal true differences between sources and reported the minimum value; 1 structure: no structure comparison is possible, because there was only one structure available; no structure: no structure comparison is possible, because there was no structure available. Source: From which databases the data come from
H
Health Data Interactive
dataverse.harvard.edu
Updated Feb 10, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2011). Health Data Interactive [Dataset]. http://doi.org/10.7910/DVN/BHUMXV
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/BHUMXV
Dataset updated
Feb 10, 2011
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Health Data Interactive produces a series of tables on national health statistics. Tables are customizable, and users can download tables, charts and reports. Health topics include, and are not limited to: hospital discharges, mental health, disabilities, diabetes and childbirth. Background Health Data Interactive is a part of the National Center for Health Statistics at the Centers for Disease Control and Prevention. From this website, users can get information on a variety of health topics and trends in the United States. Topics are organized under the following categories: health and functional status; health care use and expenditures; health conditions; health insurance and access; mortality and life expectancy; pregnancy and birth; risk factors and disease prevention. User Functionality Users can choose to download the data files or view the results in chart, table or graph form. Since users can control the which variables are included and how they are presented, they have a wide variety of customization options; directions for how to work with the tables are provided. Users can view data by age, race/ethnicity, gender and/ or geographic region. Data Notes Fourteen different data sources are used and are clearly identified for each table. There are links to each source from the website. The tables are updated frequently, but the site does not specify when. The most recent data is from 2009.
A
‘2018 RP Citations’ analyzed by Analyst-2
analyst-2.ai
Updated Dec 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2018). ‘2018 RP Citations’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-2018-rp-citations-ef7e/latest
Explore at:
Dataset updated
Dec 15, 2018
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘2018 RP Citations’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/65527fab-0701-44f9-89a0-7fe9d7161472 on 12 February 2022.

--- Dataset description provided by original source is as follows ---

AUSTIN POLICE DEPARTMENT DATA DISCLAIMER 1. The data provided are for informational use only and may differ from official APD crime data. 2. APD’s crime database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. 3. The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.

--- Original source retains full ownership of the source dataset ---
A
‘R2R 2010’ analyzed by Analyst-2
analyst-2.ai
Updated Dec 19, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2010). ‘R2R 2010’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-r2r-2010-332d/bd16e1a1/?iid=024-969&v=presentation
Explore at:
Dataset updated
Dec 19, 2010
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘R2R 2010’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/75c7ed76-c7d7-451c-8ed6-36d485c68c68 on 26 January 2022.

--- Dataset description provided by original source is as follows ---

ResponsAUSTIN POLICE DEPARTMENT DATA DISCLAIMER 1. The data provided are for informational use only and may differ from official APD crime data. 2. APD’s crime database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. 3. The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided. e to Resistance dataset for 2010

--- Original source retains full ownership of the source dataset ---
The Urbanity Global Network Dataset
figshare.com
txt
Updated Dec 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Winston Yap (2023). The Urbanity Global Network Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.22124219.v12
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22124219.v12
Dataset updated
Dec 22, 2023
Dataset provided by
figshare
Authors
Winston Yap
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Global Urban Network (GUN) dataset provides pre-computed node and edge attribute features for various cities. Each layer is available in .geojson format and can easily be converted into NetworkX, igraph, PyG, and DGL graph formats.

For node attributes, we adopt a uniform Euclidean approach, as it provides a consistent, straightforward, and extensible basis for integrating heterogeneous data sources across different network locations. Accordingly, we construct 100 metres euclidean buffers for each network node and compute the spatial intersection with spatial targets (e.g., street view imagery points, points of interest, and building footprints). To ensure spatial consistency and accurate distance computation, we project spatial entities into local coordinate reference systems (CRS). Users can employ the Urbanity package to generate Euclidean buffers of arbitrary distance.

For edge attributes, we adopt a two-step approach: 1) compute the distance between each spatial point of interest and its proximate edges in the network, and 2) assign entities to the corresponding edge with lowest distance. To account for remote edges (e.g., peripheral routes that are not located close to any amenities), we specify a distance threshold of 50 metres. For buildings, we compute the distance between building centroids and their respective network edge. Accordingly, we compute spatial indicators based on the set of elements assigned to each network edge.

We also release aggregated subzone statistics for each city. Similarly, users can employ the Urbanity package to generate aggregate statistics for any arbitrary geographic boundary.

Urbanity Python package: https://github.com/winstonyym/urbanity.
International Comprehensive Ocean-Atmosphere Data Set (ICOADS)...
catalog.data.gov
ncei.noaa.gov
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DOC/NOAA/NESDIS/NCEI > National Centers for Environmental Information, NESDIS, NOAA, U.S. Department of Commerce (Point of Contact) (2023). International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Near-Real-Time (NRT) - Daily, Release 3.0.2 [Dataset]. https://catalog.data.gov/dataset/international-comprehensive-ocean-atmosphere-data-set-icoads-near-real-time-nrt-daily-release-3
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
United States Department of Commercehttp://www.commerce.gov/
National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
National Environmental Satellite, Data, and Information Service
Description
The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) is the world's most extensive surface marine meteorological data collection. Building on national and international partnerships, ICOADS provides a variety of user communities with easy access to many different data sources in a consistent format. Data sources range from early historical ship observations to more modern, automated measurement systems including moored buoys and surface drifters. Past versions of the ICOADS dataset have been published as monthly files while holding a daily version of the product for internal use only. NCEI has since developed a reformatted daily product of the dataset that now aligns with the monthly, ready for public use. The objective of this initiative is to sustain the quality and usability of this high-profile ICOADS product for stakeholders that have requested the need for an expanded product. ICOADS R3.0.2 Daily is now developed and released.
d
Hourly dewpoint temperature in degrees Fahrenheit and three-digit...
catalog.data.gov
search.dataone.org
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Hourly dewpoint temperature in degrees Fahrenheit and three-digit data-source flag associated with the data, January 1, 1948 - September 30, 2015 [Dataset]. https://catalog.data.gov/dataset/hourly-dewpoint-temperature-in-degrees-fahrenheit-and-three-digit-data-source-flag-asso-30
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
The text file "Dewpoint temperature.txt" contains hourly data and associated data-source flag from January 1, 1948, to September 30, 2015. The primary source of the data is the Argonne National Laboratory, Illinois. The first four columns give year, month, day and hour of the observation. Column 5 is the data in degrees Fahrenheit. Column 6 is the data-source flag consist of a three-digit sequence in the form "xyz". They indicate if the dewpoint temperature data are original or missing, the method that was used to fill the missing periods, and any other transformations of the data. The user of the data should consult Over and others (2010) for the detailed documentation of this hourly data-source flag series. Reference Cited: Over, T.M., Price, T.H., and Ishii, A.L., 2010, Development and analysis of a meteorological database, Argonne National Laboratory, Illinois: U.S. Geological Survey Open File Report 2010-1220, 67 p., http://pubs.usgs.gov/of/2010/1220/.

Facebook

Twitter

Click to copy link

Link copied

Cite

nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources

Data from: Classification of Mars Terrain Using Multiple Data Sources

Explore at:

Dataset updated

Mar 31, 2025

Dataset provided by

NASAhttp://nasa.gov/

Description

Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

Clear search

Close search

Google apps

Main menu

Data from: Classification of Mars Terrain Using Multiple Data Sources

ICOADS Input Data Sources

Dataset for "Open access books through open data sources: Assessing...

Data sources.

Transportation Projects in Your Neighborhood

State of California - Data

About

Openness

Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

Tableau Data Visualization for OCHCO Data

Addresses (Open Data)

2018 Response to Resistance Subjects Data

Data Preparation Tools and Software Market Report | Global Forecast From...

Data Preparation Tools and Software Market Outlook

Component Analysis

BSVerticalOzone database

Data from: GALLO: An R package for Genomic Annotation and integration of...

Data from: A consensus compound/bioactivity dataset for data-driven drug...

Health Data Interactive

‘2018 RP Citations’ analyzed by Analyst-2

‘R2R 2010’ analyzed by Analyst-2

The Urbanity Global Network Dataset

International Comprehensive Ocean-Atmosphere Data Set (ICOADS)...

Hourly dewpoint temperature in degrees Fahrenheit and three-digit...

Data from: Classification of Mars Terrain Using Multiple Data SourcesSee More Versions

Data from: Classification of Mars Terrain Using Multiple Data Sources