100+ datasets found
  1. d

    Addresses (Open Data)

    • catalog.data.gov
    • data-academy.tempe.gov
    • +11more
    Updated Nov 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
    Explore at:
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

  2. O

    Department of Community Resources & Services Online Data Sources

    • opendata.howardcountymd.gov
    • data.wu.ac.at
    csv, xlsx, xml
    Updated Oct 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Community Resources & Services (2019). Department of Community Resources & Services Online Data Sources [Dataset]. https://opendata.howardcountymd.gov/w/kdeq-r7qc/j72c-n6z5?cur=LdI0ncE4AfX&from=n10jJ2BVdMM
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Oct 28, 2019
    Dataset authored and provided by
    Department of Community Resources & Services
    Description

    This dataset lists various data sources used within the Department of Community Resources & Services for various internal and external reports. This dataset allows individuals and organizations to identify the type of data they are looking for and to which geographical level they are trying to get the data for (i.e. National, State, County, etc.). This dataset will be updated every quarter and should be utilized for research purposes

  3. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  4. Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Classification of Mars Terrain Using Multiple Data Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/classification-of-mars-terrain-using-multiple-data-sources
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Classification of Mars Terrain Using Multiple Data Sources Alan Kraut1, David Wettergreen1 ABSTRACT. Images of Mars are being collected faster than they can be analyzed by planetary scientists. Automatic analysis of images would enable more rapid and more consistent image interpretation and could draft geologic maps where none yet exist. In this work we develop a method for incorporating images from multiple instruments to classify Martian terrain into multiple types. Each image is segmented into contiguous groups of similar pixels, called superpixels, with an associated vector of discriminative features. We have developed and tested several classification algorithms to associate a best class to each superpixel. These classifiers are trained using three different manual classifications with between 2 and 6 classes. Automatic classification accuracies of 50 to 80% are achieved in leave-one-out cross-validation across 20 scenes using a multi-class boosting classifier.

  5. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset provided by
    LTCI, Télécom Paris, Institut Polytechnique de Paris
    Authors
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  6. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  7. Open Data 500 Companies

    • kaggle.com
    zip
    Updated Jun 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GovLab (2017). Open Data 500 Companies [Dataset]. https://www.kaggle.com/govlab/open-data-500-companies
    Explore at:
    zip(157889 bytes)Available download formats
    Dataset updated
    Jun 22, 2017
    Dataset provided by
    The GovLabhttp://www.thegovlab.org/
    Authors
    GovLab
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    The Open Data 500, funded by the John S. and James L. Knight Foundation (http://www.knightfoundation.org/) and conducted by the GovLab, is the first comprehensive study of U.S. companies that use open government data to generate new business and develop new products and services.

    Study Goals

    • Provide a basis for assessing the economic value of government open data

    • Encourage the development of new open data companies

    • Foster a dialogue between government and business on how government data can be made more useful

    The Govlab's Approach

    The Open Data 500 study is conducted by the GovLab at New York University with funding from the John S. and James L. Knight Foundation. The GovLab works to improve people’s lives by changing how we govern, using technology-enabled solutions and a collaborative, networked approach. As part of its mission, the GovLab studies how institutions can publish the data they collect as open data so that businesses, organizations, and citizens can analyze and use this information.

    Company Identification

    The Open Data 500 team has compiled our list of companies through (1) outreach campaigns, (2) advice from experts and professional organizations, and (3) additional research.

    Outreach Campaign

    • Mass email to over 3,000 contacts in the GovLab network

    • Mass email to over 2,000 contacts OpenDataNow.com

    • Blog posts on TheGovLab.org and OpenDataNow.com

    • Social media recommendations

    • Media coverage of the Open Data 500

    • Attending presentations and conferences

    Expert Advice

    • Recommendations from government and non-governmental organizations

    • Guidance and feedback from Open Data 500 advisors

    Research

    • Companies identified for the book, Open Data Now

    • Companies using datasets from Data.gov

    • Directory of open data companies developed by Deloitte

    • Online Open Data Userbase created by Socrata

    • General research from publicly available sources

    What The Study Is Not

    The Open Data 500 is not a rating or ranking of companies. It covers companies of different sizes and categories, using various kinds of data.

    The Open Data 500 is not a competition, but an attempt to give a broad, inclusive view of the field.

    The Open Data 500 study also does not provide a random sample for definitive statistical analysis. Since this is the first thorough scan of companies in the field, it is not yet possible to determine the exact landscape of open data companies.

  8. NYS Substance Use Disorder Data

    • kaggle.com
    zip
    Updated Jan 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of New York (2021). NYS Substance Use Disorder Data [Dataset]. https://www.kaggle.com/datasets/new-york-state/nys-substance-use-disorder-data/discussion
    Explore at:
    zip(798133 bytes)Available download formats
    Dataset updated
    Jan 1, 2021
    Dataset authored and provided by
    State of New York
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset hosted by the State of New York. The state has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York State using Kaggle and all of the data sources available through the State of New York organization page!

    • Update Frequency: This dataset is updated annually.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    This dataset is distributed under the following licenses: Public Domain

  9. Open Source And General Resource Software

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated Nov 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). Open Source And General Resource Software [Dataset]. https://catalog.data.gov/dataset/open-source-and-general-resource-software
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This dataset lists out all software in use by NASA

  10. d

    Open Data Portal Tutorial for Maryland State Agencies

    • datasets.ai
    • opendata.maryland.gov
    • +1more
    33
    Updated Nov 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maryland (2020). Open Data Portal Tutorial for Maryland State Agencies [Dataset]. https://datasets.ai/datasets/open-data-portal-tutorial-for-maryland-state-agencies
    Explore at:
    33Available download formats
    Dataset updated
    Nov 10, 2020
    Dataset authored and provided by
    State of Maryland
    Area covered
    Maryland
    Description

    This is a PDF document created by the Department of Information Technology (DoIT) and the Governor's Office of Performance Improvement to assist training Maryland state employees on use of the Open Data Portal, https://opendata.maryland.gov. This document covers direct data entry, uploading Excel spreadsheets, connecting source databases, and transposing data. Please note that this tutorial is intended for use by state employees, as non-state users cannot upload datasets to the Open Data Portal.

  11. Daily Social Media Active Users

    • kaggle.com
    zip
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaik Barood Mohammed Umar Adnaan Faiz (2025). Daily Social Media Active Users [Dataset]. https://www.kaggle.com/datasets/umeradnaan/daily-social-media-active-users
    Explore at:
    zip(126814 bytes)Available download formats
    Dataset updated
    May 5, 2025
    Authors
    Shaik Barood Mohammed Umar Adnaan Faiz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description:

    The "Daily Social Media Active Users" dataset provides a comprehensive and dynamic look into the digital presence and activity of global users across major social media platforms. The data was generated to simulate real-world usage patterns for 13 popular platforms, including Facebook, YouTube, WhatsApp, Instagram, WeChat, TikTok, Telegram, Snapchat, X (formerly Twitter), Pinterest, Reddit, Threads, LinkedIn, and Quora. This dataset contains 10,000 rows and includes several key fields that offer insights into user demographics, engagement, and usage habits.

    Dataset Breakdown:

    • Platform: The name of the social media platform where the user activity is tracked. It includes globally recognized platforms, such as Facebook, YouTube, and TikTok, that are known for their large, active user bases.

    • Owner: The company or entity that owns and operates the platform. Examples include Meta for Facebook, Instagram, and WhatsApp, Google for YouTube, and ByteDance for TikTok.

    • Primary Usage: This category identifies the primary function of each platform. Social media platforms differ in their primary usage, whether it's for social networking, messaging, multimedia sharing, professional networking, or more.

    • Country: The geographical region where the user is located. The dataset simulates global coverage, showcasing users from diverse locations and regions. It helps in understanding how user behavior varies across different countries.

    • Daily Time Spent (min): This field tracks how much time a user spends on a given platform on a daily basis, expressed in minutes. Time spent data is critical for understanding user engagement levels and the popularity of specific platforms.

    • Verified Account: Indicates whether the user has a verified account. This feature mimics real-world patterns where verified users (often public figures, businesses, or influencers) have enhanced status on social media platforms.

    • Date Joined: The date when the user registered or started using the platform. This data simulates user account history and can provide insights into user retention trends or platform growth over time.

    Context and Use Cases:

    • This synthetic dataset is designed to offer a privacy-friendly alternative for analytics, research, and machine learning purposes. Given the complexities and privacy concerns around using real user data, especially in the context of social media, this dataset offers a clean and secure way to develop, test, and fine-tune applications, models, and algorithms without the risks of handling sensitive or personal information.

    Researchers, data scientists, and developers can use this dataset to:

    • Model User Behavior: By analyzing patterns in daily time spent, verified status, and country of origin, users can model and predict social media engagement behavior.

    • Test Analytics Tools: Social media monitoring and analytics platforms can use this dataset to simulate user activity and optimize their tools for engagement tracking, reporting, and visualization.

    • Train Machine Learning Algorithms: The dataset can be used to train models for various tasks like user segmentation, recommendation systems, or churn prediction based on engagement metrics.

    • Create Dashboards: This dataset can serve as the foundation for creating user-friendly dashboards that visualize user trends, platform comparisons, and engagement patterns across the globe.

    • Conduct Market Research: Business intelligence teams can use the data to understand how various demographics use social media, offering valuable insights into the most engaged regions, platform preferences, and usage behaviors.

    • Sources of Inspiration: This dataset is inspired by public data from industry reports, such as those from Statista, DataReportal, and other market research platforms. These sources provide insights into the global user base and usage statistics of popular social media platforms. The synthetic nature of this dataset allows for the use of realistic engagement metrics without violating any privacy concerns, making it an ideal tool for educational, analytical, and research purposes.

    The structure and design of the dataset are based on real-world usage patterns and aim to represent a variety of users from different backgrounds, countries, and activity levels. This diversity makes it an ideal candidate for testing data-driven solutions and exploring social media trends.

    Future Considerations:

    As the social media landscape continues to evolve, this dataset can be updated or extended to include new platforms, engagement metrics, or user behaviors. Future iterations may incorporate features like post frequency, follower counts, engagement rates (likes, comments, shares), or even sentiment analysis from user-generated content.

    By leveraging this dataset, analysts and data scientists can create better, more effective strategies ...

  12. Einstein Catalog HRI CFA Sources - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Einstein Catalog HRI CFA Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/einstein-catalog-hri-cfa-sources
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This database table consists of a preliminary source list for the Einstein Observatory's High Resolution Imager (HRI). The source list, obtained from EINLINE, the Einstein On-line Service at the Smithsonian Astrophysical Observatory (SAO), contains basic information about the sources detected with the HRI. This is a service provided by NASA HEASARC .

  13. d

    Global Open Source Software Market Data

    • decipherzone.com
    csv
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Decipher Zone (2024). Global Open Source Software Market Data [Dataset]. https://www.decipherzone.com/blog-detail/benefits-of-open-source-software-development
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset authored and provided by
    Decipher Zone
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Market research dataset covering growth of the global open-source software market, including benefits, adoption, and enterprise usage in 2025.

  14. Data from: Caravan - A global community dataset for large-sample hydrology

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kratzert, Frederik; Nearing, Grey; Addor, Nans; Erickson, Tyler; Gauch, Martin; Gilon, Oren; Gudmundsson, Lukas; Hassidim, Avinatan; Klotz, Daniel; Nevo, Sella; Shalev, Guy; Matias, Yossi (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Google Research
    Google, Mountain View, CA, USA
    Geography, College of Life and Environmental Sciences, University of Exeter, Exeter, UK
    Institute for Atmospheric and Climate Science, ETH Zurich, Zurich, Switzerland
    Institute for Machine Learning, Johannes Kepler University, Linz, Austria
    Authors
    Kratzert, Frederik; Nearing, Grey; Addor, Nans; Erickson, Tyler; Gauch, Martin; Gilon, Oren; Gudmundsson, Lukas; Hassidim, Avinatan; Klotz, Daniel; Nevo, Sella; Shalev, Guy; Matias, Yossi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

    Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

    If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

    All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

    Channel Log:

    23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

    24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

    15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

    1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

    16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

    10 May 2023: Version 1.1 - No data change, just update data description.

    17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

    16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

    16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").

  15. Source Protection Area Generalized

    • open.canada.ca
    • datasets.ai
    • +3more
    esri rest, html, zip
    Updated Nov 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Ontario (2025). Source Protection Area Generalized [Dataset]. https://open.canada.ca/data/en/dataset/6dc5dd56-bf09-409d-91a4-619203313301
    Explore at:
    esri rest, html, zipAvailable download formats
    Dataset updated
    Nov 13, 2025
    Dataset provided by
    Government of Ontariohttps://www.ontario.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    A source protection area (SPA) is an area of land and water governed by a Source Protection Authority, an agency, person or body. This dataset defines the geographic boundaries where each SPA's terms of reference, assessment reports and source protection plans must be developed. *[SPA]: source protection area

  16. Data from: UNESCO World Heritage Sites Dataset

    • kaggle.com
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). UNESCO World Heritage Sites Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/unesco-world-heritage-sites-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Area covered
    World
    Description

    UNESCO World Heritage Sites Dataset

    UNESCO World Heritage Sites Dataset

    By Throwback Thursday [source]

    About this dataset

    How to use the dataset

    Here are some tips on how to make the most out of this dataset:

    • Data Exploration:

      • Begin by understanding the structure and contents of the dataset. Evaluate the number of rows (sites) and columns (attributes) available.
      • Check for missing values or inconsistencies in data entry that may impact your analysis.
      • Assess column descriptions to understand what information is included in each attribute.
    • Geographical Analysis:

      • Leverage geographical features such as latitude and longitude coordinates provided in this dataset.
      • Plot these sites on a map using any mapping software or library like Google Maps or Folium for Python. Visualizing their distribution can provide insights into patterns based on location, climate, or cultural factors.
    • Analyzing Attributes:

      • Familiarize yourself with different attributes available for analysis. Possible attributes include Name, Description, Category, Region, Country, etc.
      • Understand each attribute's format and content type (categorical, numerical) for better utilization during data analysis.
    • Exploring Categories & Regions:

      • Look at unique categories mentioned in the Category column (e.g., Cultural Site, Natural Site) to explore specific interests. This could help identify clusters within particular heritage types across countries/regions worldwide.
      • Analyze regions with high concentrations of heritage sites using data visualizations like bar plots or word clouds based on frequency counts.
    • Identify Trends & Patterns:

      • Discover recurring themes across various sites by analyzing descriptive text attributes such as names and descriptions.
      • Identify patterns and correlations between attributes by performing statistical analysis or utilizing machine learning techniques.
    • Comparison:

      • Compare different attributes to gain a deeper understanding of the sites.
      • For example, analyze the number of heritage sites per country/region or compare the distribution between cultural and natural heritage sites.
    • Additional Data Sources:

      • Use this dataset as a foundation to combine it with other datasets for in-depth analysis. There are several sources available that provide additional data on UNESCO World Heritage Sites, such as travel blogs, official tourism websites, or academic research databases.

    Remember to cite this dataset appropriately if you use it in

    Research Ideas

    • Travel Planning: This dataset can be used to identify and plan visits to UNESCO World Heritage sites around the world. It provides information about the location, category, and date of inscription for each site, allowing users to prioritize their travel destinations based on personal interests or preferences.
    • Cultural Preservation: Researchers or organizations interested in cultural preservation can use this dataset to analyze trends in UNESCO World Heritage site listings over time. By studying factors such as geographical distribution, types of sites listed, and inscription dates, they can gain insights into patterns of cultural heritage recognition and protection.
    • Statistical Analysis: The dataset can be used for statistical analysis to explore various aspects related to UNESCO World Heritage sites. For example, it could be used to examine the correlation between a country's economic indicators (such as GDP per capita) and the number or type of World Heritage sites it possesses. This analysis could provide insights into the relationship between economic development and cultural preservation efforts at a global scale

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    See the dataset description for more information.

    Columns

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Throwback Thursday.

  17. Planck List of High Redshift Source Candidates - Dataset - NASA Open Data...

    • data.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Planck List of High Redshift Source Candidates - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/planck-list-of-high-redshift-source-candidates
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Planck list of high-redshift source candidates (PHZ) is a list of 2151 sources located in the cleanest 26% of the sky and identified as point sources exhibiting an excess in the submillimeter compared to their environment. It has been built using the 48 months Planck data at 857, 545, 353 and 217 GHz combined with the 3 THz IRAS data, as it is described in Planck-2015-XXXIX. These sources are considered as high-z source candidates (z>1.5-2), given the very low contamination by Galactic cirrus, and their typical colour-colour ratio. A subsample of the PHZ list has already been followed-up with Herschel, and chararcterized as overdensities of red galaxies for more than 93% of the population, and as strongly lensed galaxies in 3% of the cases, as detailed in Planck-2014-XXVIII.

  18. Digging into data access: Sources

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Dickinson; Mark Ireland (2022). Digging into data access: Sources [Dataset]. http://doi.org/10.6084/m9.figshare.19666083.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 18, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Alex Dickinson; Mark Ireland
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sources of data for Figure 3 (in the online article; Figure 2 in print) and Tables 1, 2a, 2b, 3a and 3b of the Geoscientist article Digging into data access: The need for reform. Files:

    digging_into_data_access_sources.xlsx - spreadsheet listing all references for Figure 3 (in the online article; Figure 2 in print) and Tables 1, 2a, 2b, 3a and 3b. The spreadsheet includes references to page numbers on which quoted figures are given

    Source files for Figure 3 (in the online article; Figure 2 in print):

    digging_into_data_access_figure3online_figure2print_source_documentA.pdf - British Geological Survey Annual Report 2019–2020. Source of data for the column BGS in Figure 3 digging_into_data_access_figure3online_figure2print_source_documentB.pdf - OGA Annual Report and Accounts 2020–21. Source of data for the column OGA in Figure 3 digging_into_data_access_figure3online_figure2print_source_documentC.pdf - UK Onshore Geophysical Library Trustees' Report and Financial Statements 2020 for the Year Ended 31 December 2020. Source of data for the column UKOGL in Figure 3 digging_into_data_access_figure3online_figure2print_source_documentD.pdf - Environment Agency Annual Report and Accounts for the Financial Year 2020 to 2021. Source of data for the column EA in Figure 3

    Source files for Tables 1, 2a, 2b, 3a and 3b:

    digging_into_data_access_tables_source_documentX.pdf - X corresponds to the number in the column Source in Tables 1, 2a, 2b, 3a and 3b. See also the tab 'tables_sources' in the spreadsheet digging_into_data_access_sources.xlsx

  19. v

    New centralized heat production source zone - Dataset - Vilnius Open Data...

    • opendata.vilnius.lt
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). New centralized heat production source zone - Dataset - Vilnius Open Data portal [Dataset]. https://opendata.vilnius.lt/dataset/new-centralized-heat-production-source-zone
    Explore at:
    Dataset updated
    Sep 30, 2024
    Area covered
    Vilnius
    Description

    The dataset contains the 2021 engineering infrastructure solutions from the Vilnius city general plan – zone for new centralized heat production sources.

  20. Z

    Dataset: A continuous open source data collection platform for architectural...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    • +1more
    Updated Jan 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darius Sas; Alessandro Gilardi; Ilaria Pigazzini; Francesca Arcelli Fontana (2024). Dataset: A continuous open source data collection platform for architectural technical debt assessment [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8435445
    Explore at:
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    University of Milano-Bicocca
    Arcan SRL
    Authors
    Darius Sas; Alessandro Gilardi; Ilaria Pigazzini; Francesca Arcelli Fontana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset and replication package of the study "A continuous open source data collection platform for architectural technical debt assessment".

    Abstract

    Architectural decisions are the most important source of technical debt. In recent years, researchers spent an increasing amount of effort investigating this specific category of technical debt, with quantitative methods, and in particular static analysis, being the most common approach to investigate such a topic.

    However, quantitative studies are susceptible, to varying degrees, to external validity threats, which hinder the generalisation of their findings.

    In response to this concern, researchers strive to expand the scope of their study by incorporating a larger number of projects into their analyses. This practice is typically executed on a case-by-case basis, necessitating substantial data collection efforts that have to be repeated for each new study.

    To address this issue, this paper presents our initial attempt at tackling this problem and enabling researchers to study architectural smells at large scale, a well-known indicator of architectural technical debt. Specifically, we introduce a novel approach to data collection pipeline that leverages Apache Airflow to continuously generate up-to-date, large-scale datasets using Arcan, a tool for architectural smells detection (or any other tool).

    Finally, we present the publicly-available dataset resulting from the first three months of execution of the pipeline, that includes over 30,000 analysed commits and releases from over 10,000 open source GitHub projects written in 5 different programming languages and amounting to over a billion of lines of code analysed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data

Addresses (Open Data)

Explore at:
19 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 22, 2025
Dataset provided by
City of Tempe
Description

This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

Search
Clear search
Close search
Google apps
Main menu