100+ datasets found
  1. Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

  2. d

    Addresses (Open Data)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +11more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

  3. O

    Department of Community Resources & Services Online Data Sources

    • opendata.howardcountymd.gov
    • data.wu.ac.at
    csv, xlsx, xml
    Updated Oct 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Community Resources & Services (2019). Department of Community Resources & Services Online Data Sources [Dataset]. https://opendata.howardcountymd.gov/w/kdeq-r7qc/j72c-n6z5?cur=LdI0ncE4AfX&from=n10jJ2BVdMM
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Oct 28, 2019
    Dataset authored and provided by
    Department of Community Resources & Services
    Description

    This dataset lists various data sources used within the Department of Community Resources & Services for various internal and external reports. This dataset allows individuals and organizations to identify the type of data they are looking for and to which geographical level they are trying to get the data for (i.e. National, State, County, etc.). This dataset will be updated every quarter and should be utilized for research purposes

  4. f

    Data from: Multimorbidity in Australia: Comparing estimates derived using...

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zwar, Nicholas; Jorm, Louisa; Lujic, Sanja; Hosseinzadeh, Hassan; Simpson, Judy M. (2017). Multimorbidity in Australia: Comparing estimates derived using administrative data sources and survey data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001779669
    Explore at:
    Dataset updated
    Aug 29, 2017
    Authors
    Zwar, Nicholas; Jorm, Louisa; Lujic, Sanja; Hosseinzadeh, Hassan; Simpson, Judy M.
    Area covered
    Australia
    Description

    BackgroundEstimating multimorbidity (presence of two or more chronic conditions) using administrative data is becoming increasingly common. We investigated (1) the concordance of identification of chronic conditions and multimorbidity using self-report survey and administrative datasets; (2) characteristics of people with multimorbidity ascertained using different data sources; and (3) whether the same individuals are classified as multimorbid using different data sources.MethodsBaseline survey data for 90,352 participants of the 45 and Up Study—a cohort study of residents of New South Wales, Australia, aged 45 years and over—were linked to prior two-year pharmaceutical claims and hospital admission records. Concordance of eight self-report chronic conditions (reference) with claims and hospital data were examined using sensitivity (Sn), positive predictive value (PPV), and kappa (κ).The characteristics of people classified as multimorbid were compared using logistic regression modelling.ResultsAgreement was found to be highest for diabetes in both hospital and claims data (κ = 0.79, 0.78; Sn = 79%, 72%; PPV = 86%, 90%). The prevalence of multimorbidity was highest using self-report data (37.4%), followed by claims data (36.1%) and hospital data (19.3%). Combining all three datasets identified a total of 46 683 (52%) people with multimorbidity, with half of these identified using a single dataset only, and up to 20% identified on all three datasets. Characteristics of persons with and without multimorbidity were generally similar. However, the age gradient was more pronounced and people speaking a language other than English at home were more likely to be identified as multimorbid by administrative data.ConclusionsDifferent individuals, with different combinations of conditions, are identified as multimorbid when different data sources are used. As such, caution should be applied when ascertaining morbidity from a single data source as the agreement between self-report and administrative data is generally poor. Future multimorbidity research exploring specific disease combinations and clusters of diseases that commonly co-occur, rather than a simple disease count, is likely to provide more useful insights into the complex care needs of individuals with multiple chronic conditions.

  5. w

    State of California - Data

    • data.wu.ac.at
    Updated Oct 11, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global (2013). State of California - Data [Dataset]. https://data.wu.ac.at/odso/datahub_io/NDZlMmFjNWEtMGY1ZS00ZWVhLTgzZWEtMmY5ZmFhMGQyMjEx
    Explore at:
    Dataset updated
    Oct 11, 2013
    Dataset provided by
    Global
    Description

    About

    Data from the State of California. From website:

    Access raw State data files, databases, geographic data, and other data sources. Raw State data files can be reused by citizens and organizations for their own web applications and mashups.

    Openness

    Open. Effectively in the public domain. Terms of use page says:

    In general, information presented on this web site, unless otherwise indicated, is considered in the public domain. It may be distributed or copied as permitted by law. However, the State does make use of copyrighted data (e.g., photographs) which may require additional permissions prior to your use. In order to use any information on this web site not owned or created by the State, you must seek permission directly from the owning (or holding) sources. The State shall have the unlimited right to use for any purpose, free of any charge, all information submitted via this site except those submissions made under separate legal contract. The State shall be free to use, for any purpose, any ideas, concepts, or techniques contained in information provided through this site.

  6. Participation measures in higher education - Other data sources - APS, ONS

    • explore-education-statistics.service.gov.uk
    Updated Oct 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2023). Participation measures in higher education - Other data sources - APS, ONS [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/126b171f-84ad-4045-9bae-a8ee699cdd9f
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Time period covered
    2011 - 2021
    Description

    Office for National Statistics (ONS) Annual Population Survey (APS) data showing percentages in the aged 25-29 population qualified to Level 4 and above.

  7. Bsverticalozone Database

    • search.datacite.org
    • data.niaid.nih.gov
    • +1more
    Updated Apr 30, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Birgit Hassler; Stefanie Kremser; Greg Bodeker; Jared Lewis; Kage Nesbit; Sean Davis; Sandip Dhomse; Martin Dameris (2018). Bsverticalozone Database [Dataset]. http://doi.org/10.5281/zenodo.1217184
    Explore at:
    Dataset updated
    Apr 30, 2018
    Dataset provided by
    DataCitehttps://www.datacite.org/
    Zenodohttp://zenodo.org/
    Authors
    Birgit Hassler; Stefanie Kremser; Greg Bodeker; Jared Lewis; Kage Nesbit; Sean Davis; Sandip Dhomse; Martin Dameris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    An updated and improved version of a global, vertically resolved, monthly mean zonal mean ozone database has been calculated – hereafter referred to as the BSVertOzone database, the BSVertOzone database. Like its predecessor, it combines measurements from several satellite-based instruments and ozone profile measurements from the global ozonesonde network. Monthly mean zonal mean ozone concentrations in mixing ratio and number density are provided in 5 latitude zones, spanning 70 altitude levels (1 to 70km), or 70 pressure 5 levels that are approximately 1km apart (878.4hPa to 0.046hPa). Different data sets or "Tiers" are provided: "Tier 0" is based only on the available measurements and therefore does not completely cover the whole globe or the full vertical range uniformly; the "Tier 0.5" monthly mean zonal means are calculated from a filled version of the Tier 0 database where missing monthly mean zonal mean values are estimated from correlations at level 20 against a total column ozone database and then at levels above and below on correlations with lower and upper levels respectively. The Tier 10 0.5 database includes the full range of measurement variability and is created as an intermediate step for the calculation of the "Tier 1" data where a least squares regression model is used to attribute variability to various known forcing factors for ozone. Regression model fit coefficients are expanded in Fourier series and Legendre polynomials (to account for seasonality and latitudinal structure, respectively). Four different combinations of contributions from selected regression model basis functions result in four different "Tier 1" data set that can be used for comparisons with chemistry-climate model simulations that do not 15 exhibit the same unforced variability as reality (unless they are nudged towards reanalyses). Compared to previous versions of the database, this update includes additional satellite data sources and ozonesonde measurements to extend the database period to 2016. Additional improvements over the previous version of the database include: (i) Adjustments of measurements to account for biases and drifts between different data sources (using a chemistry-transport model simulation as a transfer standard), (ii) a more objective way to determine the optimum number of Fourier and Legendre expansions for the basis 20 function fit coefficients, and (iii) the derivation of methodological and measurement uncertainties on each database value are traced through all data modification steps. Comparisons with the ozone database from SWOOSH (Stratospheric Water and OzOne Satellite Homogenized data set) show excellent agreements in many regions of the globe, and minor differences caused by different bias adjustment procedures for the two databases. However, compared to SWOOSH, BSVertOzone additionally covers the troposphere.

  8. Z

    Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Queen-Mary University of London
    University of Warwick
    Authors
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

    This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

    The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

    The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

    The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

    The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

    The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

    The data sources used are:

    The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

    The entries in the dataset contain the following information:

    • Claim. Text of the claim.

    • Claim label. The labels are: False, and True.

    • Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

    • Original information source. Information about which general information source was used to obtain the claim.

    • Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

    Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

    References

    • Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

    • Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

    • Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

    • Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

    • Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

    • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

    • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

    • Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

    • Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

    • Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

    • Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

    • Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.

  9. DataSheet2_Data Sources for Drug Utilization Research in Brazil—DUR-BRA...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes (2023). DataSheet2_Data Sources for Drug Utilization Research in Brazil—DUR-BRA Study.xlsx [Dataset]. http://doi.org/10.3389/fphar.2021.789872.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Lisiane Freitas Leal; Claudia Garcia Serpa Osorio-de-Castro; Luiz Júpiter Carneiro de Souza; Felipe Ferre; Daniel Marques Mota; Marcia Ito; Monique Elseviers; Elisangela da Costa Lima; Ivan Ricardo Zimmernan; Izabela Fulone; Monica Da Luz Carvalho-Soares; Luciane Cruz Lopes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Background: In Brazil, studies that map electronic healthcare databases in order to assess their suitability for use in pharmacoepidemiologic research are lacking. We aimed to identify, catalogue, and characterize Brazilian data sources for Drug Utilization Research (DUR).Methods: The present study is part of the project entitled, “Publicly Available Data Sources for Drug Utilization Research in Latin American (LatAm) Countries.” A network of Brazilian health experts was assembled to map secondary administrative data from healthcare organizations that might provide information related to medication use. A multi-phase approach including internet search of institutional government websites, traditional bibliographic databases, and experts’ input was used for mapping the data sources. The reviewers searched, screened and selected the data sources independently; disagreements were resolved by consensus. Data sources were grouped into the following categories: 1) automated databases; 2) Electronic Medical Records (EMR); 3) national surveys or datasets; 4) adverse event reporting systems; and 5) others. Each data source was characterized by accessibility, geographic granularity, setting, type of data (aggregate or individual-level), and years of coverage. We also searched for publications related to each data source.Results: A total of 62 data sources were identified and screened; 38 met the eligibility criteria for inclusion and were fully characterized. We grouped 23 (60%) as automated databases, four (11%) as adverse event reporting systems, four (11%) as EMRs, three (8%) as national surveys or datasets, and four (11%) as other types. Eighteen (47%) were classified as publicly and conveniently accessible online; providing information at national level. Most of them offered more than 5 years of comprehensive data coverage, and presented data at both the individual and aggregated levels. No information about population coverage was found. Drug coding is not uniform; each data source has its own coding system, depending on the purpose of the data. At least one scientific publication was found for each publicly available data source.Conclusions: There are several types of data sources for DUR in Brazil, but a uniform system for drug classification and data quality evaluation does not exist. The extent of population covered by year is unknown. Our comprehensive and structured inventory reveals a need for full characterization of these data sources.

  10. n

    Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952

    • data.niaid.nih.gov
    • tycho.pitt.edu
    csv
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Anne Cross; Donald Burke; Donald Burke; Shawn Brown; Derek Cummings; Irene Ruberto; Xin Xiong; Nian Shong Chok; Marc Choisy (2018). Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952 [Dataset]. http://doi.org/10.25337/T7/ptycho.v2.0/US.67924001
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2018
    Authors
    Willem Van Panhuis; Anne Cross; Donald Burke; Donald Burke; Shawn Brown; Derek Cummings; Irene Ruberto; Xin Xiong; Nian Shong Chok; Marc Choisy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, United States
    Variables measured
    Case, Dead, Count of disease cases, Infectious disease incidence
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  11. O

    Repeat Offender Registrations Quick View

    • data.austintexas.gov
    • datahub.austintexas.gov
    • +3more
    csv, xlsx, xml
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Austin, Texas - data.austintexas.gov (2025). Repeat Offender Registrations Quick View [Dataset]. https://data.austintexas.gov/w/cxx8-pt23/7r79-5ncn?cur=jOMujzKwoym&from=btsQpT4cx3O
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Nov 23, 2025
    Dataset authored and provided by
    City of Austin, Texas - data.austintexas.gov
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    City of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq

    This is a slimmed down view of Repeat Offender Registrations for the purposes of tabular display.

    Link to complete dataset: https://data.austintexas.gov/City-Government/Repeat-Offender-Registrations/86z9-i27i

    Austin Development Services Data Disclaimer:

    1. The data provided are for informational use only and may differ from official department data.
    2. Austin Development Services’ database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used.
    3. Austin Development Services does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.
  12. d

    Data from: The benefit of augmenting open data with clinical data-warehouse...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Ferté; Vianney Jouhet; Romain Griffier; Boris Hejblum; Rodolphe Thiébaut; , Bordeaux University Hospital Covid-19 Crisis Task Force (2023). The benefit of augmenting open data with clinical data-warehouse EHR for forecasting SARS-CoV-2 hospitalizations in Bordeaux area, France [Dataset]. http://doi.org/10.5061/dryad.hhmgqnkkx
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 26, 2023
    Dataset provided by
    Dryad
    Authors
    Thomas Ferté; Vianney Jouhet; Romain Griffier; Boris Hejblum; Rodolphe Thiébaut; , Bordeaux University Hospital Covid-19 Crisis Task Force
    Time period covered
    Oct 10, 2022
    Area covered
    Bordeaux
    Description

    Data are stored in a .rdata file. Please use R (https://www.r-project.org/) software to open the data.

  13. G

    Nationwide Collection of Heat Flow and Temperature Gradient Data and Related...

    • gdr.openei.org
    • data.openei.org
    • +1more
    archive, data
    Updated Mar 1, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Richards; Cathy Chickering Pace; David Blackwell; Maria Richards; Cathy Chickering Pace; David Blackwell (2014). Nationwide Collection of Heat Flow and Temperature Gradient Data and Related Resources [Dataset]. https://gdr.openei.org/submissions/1704
    Explore at:
    data, archiveAvailable download formats
    Dataset updated
    Mar 1, 2014
    Dataset provided by
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Renewable Power Office. Geothermal Technologies Program (EE-4G)
    Geothermal Data Repository
    Southern Methodist University
    Authors
    Maria Richards; Cathy Chickering Pace; David Blackwell; Maria Richards; Cathy Chickering Pace; David Blackwell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset compiles heat flow and temperature gradient data from over 44,000 wells across the United States, along with more than 6,000 related geothermal exploration resources. Originally assembled prior to 2014 for the now-retired National Geothermal Data System (NGDS), the collection includes curated well data, scanned field notes, temperature-depth curves, publications, maps, and other supporting documents. SMU Geothermal Laboratory contributed two different nationwide heat flow databases to the project. One is based on equilibrium temperature measurements (over 14,000 sites) and the other is based on corrected bottom hole temperature (BHT) data from oil and gas industry wells (over 30,000 sites). In addition, scanned field notes and temperature-depth curves were associated with approximately 6,000 specific sites in the heat flow database. Records were corrected and overlapping sites in the equilibrium heat flow database were linked between the original SMU National database and the UND Global Heat Flow database. New or related sites, which were not previously published because they lacked full heat flow content, are now included as gradient only information along with their detailed temperature data to fill in data gaps. Finally, SMU submitted over 920 scanned publications, reports, and maps suitable for full text searching. The dataset is provided in two flat-structured zip archives: one containing the curated well data and another containing related resources. An Excel index file is provided for each archive, allowing filtering by well name, location, and description. Data files are labeled with state or institutional origin where available.

  14. Data from: Source Data for: Machine Learning-Based Identification of...

    • ourarchive.otago.ac.nz
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuan Yue (2025). Source Data for: Machine Learning-Based Identification of Abnormal Functional Connectivity in Obesity Across Different Metabolic States [Dataset]. https://ourarchive.otago.ac.nz/esploro/outputs/dataset/Source-Data-for-Machine-Learning-Based-Identification/9926783585601891
    Explore at:
    Dataset updated
    Oct 14, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yuan Yue
    Time period covered
    Oct 14, 2025
    Description

    Excel data files for all Figures 3-5, 7, and 8 in the manuscript: Machine Learning-Based Identification of Abnormal Functional Connectivity in Obesity Across Different Metabolic States.

  15. D

    Self-Serve Data Access Portals Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Self-Serve Data Access Portals Market Research Report 2033 [Dataset]. https://dataintelo.com/report/self-serve-data-access-portals-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Self-Serve Data Access Portals Market Outlook




    According to our latest research, the global Self-Serve Data Access Portals market size reached USD 4.1 billion in 2024. The market is experiencing robust momentum, with a CAGR of 18.2% projected from 2025 to 2033. By the end of 2033, the market is forecasted to attain a valuation of USD 19.7 billion. This significant growth is being propelled by the increasing demand for democratized data access, the proliferation of big data analytics, and the widespread adoption of self-service business intelligence tools across diverse industry verticals. The market is also being shaped by the accelerating pace of digital transformation and the need for agile, data-driven decision-making processes within organizations.




    A primary growth factor for the Self-Serve Data Access Portals market is the escalating need for organizations to empower non-technical users with seamless access to data. As enterprises strive to become more data-driven, there is a pronounced shift towards enabling business users to independently extract, analyze, and visualize data without relying on IT teams. This trend is particularly pronounced in sectors such as BFSI, healthcare, and retail, where timely insights are critical for operational efficiency and competitive advantage. The democratization of data is fostering a culture of self-service analytics, reducing bottlenecks, and accelerating the decision-making process. Furthermore, the integration of advanced analytics and AI-driven features within self-serve portals is enhancing user experience and broadening the scope of actionable insights, thereby fueling market expansion.




    Another significant driver is the rapid adoption of cloud-based solutions, which has transformed the deployment landscape for self-serve data access portals. Cloud deployment offers scalability, flexibility, and cost-effectiveness, making it an attractive option for organizations of all sizes, especially small and medium enterprises (SMEs). The cloud enables seamless integration with various data sources, supports remote access, and ensures high availability and disaster recovery. As a result, cloud-based self-serve data access portals are gaining traction among enterprises seeking to modernize their data infrastructure and streamline operations. Additionally, the rise of hybrid and multi-cloud environments is further facilitating the adoption of self-serve portals, as organizations look to leverage the best features of different cloud platforms while maintaining data security and compliance.




    The growing emphasis on regulatory compliance and data governance is also contributing to the expansion of the Self-Serve Data Access Portals market. Organizations are increasingly required to adhere to stringent data protection regulations such as GDPR, HIPAA, and CCPA, necessitating robust data access controls and audit trails. Modern self-serve portals are equipped with advanced security features, role-based access controls, and comprehensive logging capabilities, enabling organizations to maintain compliance while providing users with the freedom to explore and utilize data. This balance between accessibility and governance is driving adoption across highly regulated industries, further strengthening the market's growth trajectory.




    From a regional perspective, North America continues to dominate the market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The mature IT infrastructure, high digital literacy, and early adoption of advanced analytics solutions in North America have positioned the region as a frontrunner. Meanwhile, Asia Pacific is emerging as a high-growth market, driven by rapid digitalization, expanding enterprise IT budgets, and increasing awareness of data-driven business strategies. The presence of a large SME sector and government initiatives promoting digital transformation are further accelerating market growth in the region. Europe, with its strong focus on data privacy and compliance, is also witnessing steady adoption of self-serve data access portals, particularly in the BFSI and healthcare sectors.



    Component Analysis




    The Self-Serve Data Access Portals market by component is segmented into software and services. The software segment comprises the core platforms and applications that facilitate self-service data access, analytics, and visualization. These solutions are designed to offer intuitive interfaces, robust data i

  16. d

    Replication Data for: Scaling Data from Multiple Sources

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc (2023). Replication Data for: Scaling Data from Multiple Sources [Dataset]. http://doi.org/10.7910/DVN/FOUVEL
    Explore at:
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Enamorado, Ted; Lopez-Moctezuma, Gabriel; Ratkovic, Marc
    Description

    We introduce a method for scaling two data sets from different sources. The proposed method estimates a latent factor common to both datasets as well as an idiosyncratic factor unique to each. In addition, it offers a flexible modeling strategy that permits the scaled locations to be a function of covariates, and efficient implementation allows for inference through resampling. A simulation study shows that our proposed method improves over existing alternatives in capturing the variation common to both datasets, as well as the latent factors specific to each. We apply our proposed method to vote and speech data from the 112th U.S. Senate. We recover a shared subspace that aligns with a standard ideological dimension running from liberals to conservatives while recovering the words most associated with each senator's location. In addition, we estimate a word-specific subspace that ranges from national security to budget concerns, and a vote-specific subspace with Tea Party senators on one extreme and senior committee leaders on the other.

  17. G

    Data Integration Tools Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Integration Tools Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-integration-tools-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Integration Tools Market Outlook



    According to our latest research, the global Data Integration Tools market size reached USD 13.6 billion in 2024, demonstrating robust expansion driven by the surge in digital transformation initiatives and the rising importance of seamless data management across enterprises. The market is projected to grow at a CAGR of 11.2% from 2025 to 2033, reaching a forecasted value of USD 34.6 billion by 2033. This impressive growth trajectory is fueled by the increasing adoption of cloud-based solutions, the proliferation of big data analytics, and the growing complexity of heterogeneous data environments. As per our latest research, organizations worldwide are prioritizing data integration to enhance operational efficiency, improve decision-making, and achieve a unified view of enterprise data, positioning the data integration tools market for sustained growth throughout the forecast period.




    One of the primary growth factors driving the Data Integration Tools market is the exponential increase in data volumes generated by organizations across various industries. With the proliferation of IoT devices, social media, mobile applications, and cloud platforms, enterprises are facing unprecedented challenges in managing and consolidating disparate data sources. Data integration tools play a pivotal role in enabling organizations to aggregate, cleanse, and harmonize data from multiple sources, ensuring data consistency and reliability. The growing emphasis on business intelligence, analytics, and real-time data processing further underscores the need for robust data integration solutions. As companies strive to harness actionable insights from vast data reservoirs, the demand for advanced data integration platforms is expected to soar, supporting the marketÂ’s upward momentum.




    Another significant factor contributing to the expansion of the Data Integration Tools market is the accelerated adoption of cloud computing and hybrid IT environments. As businesses migrate their workloads to the cloud and embrace multi-cloud strategies, the complexity of integrating on-premises and cloud-based data sources increases dramatically. Data integration tools equipped with cloud-native capabilities offer seamless connectivity, scalability, and flexibility, empowering organizations to synchronize data across diverse ecosystems efficiently. Furthermore, the rise of Software-as-a-Service (SaaS) applications and the need for real-time data synchronization are prompting enterprises to invest in modern integration platforms. Vendors are responding by enhancing their offerings with AI-driven automation, self-service capabilities, and support for emerging data architectures, thereby fueling market growth.




    The evolution of regulatory landscapes and data privacy requirements also plays a crucial role in shaping the Data Integration Tools market. With stringent regulations such as GDPR, CCPA, and HIPAA, organizations must ensure that their data integration processes adhere to compliance standards and maintain data integrity. Data integration tools facilitate secure data movement, lineage tracking, and auditability, enabling enterprises to mitigate compliance risks and safeguard sensitive information. Additionally, the growing trend of data democratization and self-service analytics is driving demand for user-friendly integration platforms that empower business users to access and blend data without extensive technical expertise. These factors collectively contribute to the sustained adoption and innovation within the data integration tools landscape.



    In the context of evolving technological landscapes, the introduction of Launch Integration Services is becoming increasingly significant. As organizations strive to streamline their data operations, these services offer a comprehensive approach to integrating diverse data sources with minimal disruption. Launch Integration Services are designed to facilitate seamless connectivity across various platforms, ensuring that data flows smoothly and efficiently within an enterprise. By leveraging these services, companies can enhance their data management capabilities, reduce operational bottlenecks, and improve overall data quality. The ability to launch integration services quickly and effectively is critical for organizations looking to maintain a competitive edge in today's fast-paced digital environment.

    <br

  18. n

    Counts of Diphtheria reported in UNITED STATES OF AMERICA: 1888-1981

    • data.niaid.nih.gov
    • tycho.pitt.edu
    • +1more
    csv
    Updated Apr 1, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Willem Van Panhuis; Anne Cross; Donald Burke; Donald Burke; Shawn Brown; Derek Cummings; Irene Ruberto; Xin Xiong; Nian Shong Chok; Marc Choisy (2018). Counts of Diphtheria reported in UNITED STATES OF AMERICA: 1888-1981 [Dataset]. http://doi.org/10.25337/T7/ptycho.v2.0/US.397428000
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 1, 2018
    Authors
    Willem Van Panhuis; Anne Cross; Donald Burke; Donald Burke; Shawn Brown; Derek Cummings; Irene Ruberto; Xin Xiong; Nian Shong Chok; Marc Choisy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, United States
    Variables measured
    Case, Dead, Cumulative incidence, Count of disease cases, Infectious disease incidence
    Description

    Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

    Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

    Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

  19. u

    Data from: GALLO: An R package for Genomic Annotation and integration of...

    • portalcientifico.unileon.es
    • portalcienciaytecnologia.jcyl.es
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela (2020). GALLO: An R package for Genomic Annotation and integration of multiple data source in livestock for positional candidate LOci [Dataset]. https://portalcientifico.unileon.es/documentos/668fc461b9e7c03b01bdb93f
    Explore at:
    Dataset updated
    2020
    Authors
    Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela; Fonseca, Pablo, A.S.; Suárez-Vega, Aroa; Marras, Gabriele; Cánovas, Ángela
    Description

    The development of high-throughput sequencing and genotyping methodologies allowed the identification of thousands of genomic regions associated with several complex traits. The integration of multiple sources of biological information is a crucial step required to better understand patterns regulating the development of these traits. Genomic Annotation in Livestock for positional candidate LOci (GALLO) is an R package developed for the accurate annotation of genes and quantitative trait loci (QTLs) located in regions identified in common genomic analyses performed in livestock, such as Genome-Wide Association Studies and transcriptomics using RNA-Sequencing. Moreover, GALLO allows the graphical visualization of gene and QTL annotation results, data comparison among different grouping factors (e.g., methods, breeds, tissues, statistical models, studies, etc.), and QTL enrichment in different livestock species including cattle, pigs, sheep, and chickens, etc. Consequently, GALLO is a useful package for the annotation, identification of hidden patterns across datasets, datamining previously reported associations, as well as the efficient scrutinization of the genetic architecture of complex traits in livestock.

  20. T

    Site Plan Cases

    • datahub.austintexas.gov
    • data.austintexas.gov
    • +2more
    Updated Nov 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Austin, Texas - data.austintexas.gov (2025). Site Plan Cases [Dataset]. https://datahub.austintexas.gov/Building-and-Development/Site-Plan-Cases/mavg-96ck
    Explore at:
    kml, csv, xml, xlsx, kmz, application/geo+jsonAvailable download formats
    Dataset updated
    Nov 21, 2025
    Dataset authored and provided by
    City of Austin, Texas - data.austintexas.gov
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    City of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq

    This data set contains information about the site plan case applications submitted for review to the City of Austin. The data set includes information about case status in the permit review system, case number, proposed use, applicant, owner, and location.

    Austin Development Services Data Disclaimer:

    1. The data provided are for informational use only and may differ from official department data.
    2. Austin Development Services’ database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used.
    3. Austin Development Services does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
Organization logo

Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal

Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description

Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

Search
Clear search
Close search
Google apps
Main menu