100+ datasets found
  1. d

    Addresses (Open Data)

    • catalog.data.gov
    • data.tempe.gov
    • +9more
    Updated Jul 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data
    Explore at:
    Dataset updated
    Jul 26, 2025
    Dataset provided by
    City of Tempe
    Description

    This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

  2. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset authored and provided by
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  3. A

    Open Data Sources and Resources

    • data.amerigeoss.org
    • cloud.csiss.gmu.edu
    json, png, rdf
    Updated Aug 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DevTestOrg (2019). Open Data Sources and Resources [Dataset]. https://data.amerigeoss.org/fi/dataset/opendata
    Explore at:
    json(95555), json(22471830), png(765404), json(537760), rdf(465812), rdf(932592), json(728031), jsonAvailable download formats
    Dataset updated
    Aug 6, 2019
    Dataset provided by
    DevTestOrg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Open Data Sources and Resources

  4. Open Source And General Resource Software

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated May 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). Open Source And General Resource Software [Dataset]. https://catalog.data.gov/dataset/open-source-and-general-resource-software
    Explore at:
    Dataset updated
    May 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This dataset lists out all software in use by NASA

  5. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  6. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    World Bankhttps://www.worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  7. O

    Department of Economic and Community Development (DECD) – Business...

    • data.ct.gov
    • datasets.ai
    • +1more
    application/rdfxml +5
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Economic and Community Development (2025). Department of Economic and Community Development (DECD) – Business Assistance Portfolio [Dataset]. https://data.ct.gov/Business/Department-of-Economic-and-Community-Development-D/xnw3-nytd
    Explore at:
    application/rdfxml, csv, application/rssxml, tsv, json, xmlAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Department of Economic and Community Development
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    DECD's listing of direct financial assistance to businesses from July 1, 2009 through June 30, 2024. New projects are usually added quarterly, but updates may be made on an ongoing basis.

    Small Business Boost loan recipients can be found here: https://data.ct.gov/d/yk65-8y82

  8. d

    Open Data Portal Tutorial for Maryland State Agencies

    • datasets.ai
    • opendata.maryland.gov
    • +2more
    33
    Updated Oct 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maryland (2024). Open Data Portal Tutorial for Maryland State Agencies [Dataset]. https://datasets.ai/datasets/open-data-portal-tutorial-for-maryland-state-agencies
    Explore at:
    33Available download formats
    Dataset updated
    Oct 8, 2024
    Dataset authored and provided by
    State of Maryland
    Area covered
    Maryland
    Description

    This is a PDF document created by the Department of Information Technology (DoIT) and the Governor's Office of Performance Improvement to assist training Maryland state employees on use of the Open Data Portal, https://opendata.maryland.gov. This document covers direct data entry, uploading Excel spreadsheets, connecting source databases, and transposing data. Please note that this tutorial is intended for use by state employees, as non-state users cannot upload datasets to the Open Data Portal.

  9. Z

    Data from: Caravan - A global community dataset for large-sample hydrology

    • data.niaid.nih.gov
    Updated Jan 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erickson, Tyler (2025). Caravan - A global community dataset for large-sample hydrology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6522634
    Explore at:
    Dataset updated
    Jan 16, 2025
    Dataset provided by
    Matias, Yossi
    Gilon, Oren
    Nevo, Sella
    Klotz, Daniel
    Gauch, Martin
    Kratzert, Frederik
    Hassidim, Avinatan
    Shalev, Guy
    Nearing, Grey
    Gudmundsson, Lukas
    Addor, Nans
    Erickson, Tyler
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w

    Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.

    If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.

    All current development and additional community extensions can be found at https://github.com/kratzert/Caravan

    Channel Log:

    23 May 2022: Version 0.2 - Resolved a bug when renaming the LamaH gauge ids from the LamaH ids to the official gauge ids provided as "govnr" in the LamaH dataset attribute files.

    24 May 2022: Version 0.3 - Fixed gaps in forcing data in some "camels" (US) basins.

    15 June 2022: Version 0.4 - Fixed replacing negative CAMELS US values with NaN (-999 in CAMELS indicates missing observation).

    1 December 2022: Version 0.4 - Added 4298 basins in the US, Canada and Mexico (part of HYSETS), now totalling to 6830 basins. Fixed a bug in the computation of catchment attributes that are defined as pour point properties, where sometimes the wrong HydroATLAS polygon was picked. Restructured the attribute files and added some more meta data (station name and country).

    16 January 2023: Version 1.0 - Version of the official paper release. No changes in the data but added a static copy of the accompanying code of the paper. For the most up to date version, please check https://github.com/kratzert/Caravan

    10 May 2023: Version 1.1 - No data change, just update data description.

    17 May 2023: Version 1.2 - Updated a handful of attribute values that were affected by a bug in their derivation. See https://github.com/kratzert/Caravan/issues/22 for details.

    16 April 2024: Version 1.4 - Added 9130 gauges from the original source dataset that were initially not included because of the area thresholds (i.e. basins smaller than 100sqkm or larger than 2000sqkm). Also extended the forcing period for all gauges (including the original ones) to 1950-2023. Added two different download options that include timeseries data only as either csv files (Caravan-csv.tar.xz) or netcdf files (Caravan-nc.tar.xz). Including the large basins also required an update in the earth engine code

    16 Jan 2025: Version 1.5 - Added FAO Penman-Monteith PET (potential_evaporation_sum_FAO_PENMAN_MONTEITH) and renamed the ERA5-LAND potential_evaporation band to potential_evaporation_sum_ERA5_LAND. Also added all PET-related climated indices derived with the Penman-Monteith PET band (suffix "_FAO_PM") and renamed the old PET-related indices accordingly (suffix "_ERA5_LAND").

  10. Einstein Catalog HRI CFA Sources - Dataset - NASA Open Data Portal

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Einstein Catalog HRI CFA Sources - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/einstein-catalog-hri-cfa-sources
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This database table consists of a preliminary source list for the Einstein Observatory's High Resolution Imager (HRI). The source list, obtained from EINLINE, the Einstein On-line Service at the Smithsonian Astrophysical Observatory (SAO), contains basic information about the sources detected with the HRI. This is a service provided by NASA HEASARC .

  11. D

    Building

    • detroitdata.org
    Updated Sep 7, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Downtown Detroit Partnership (2018). Building [Dataset]. https://detroitdata.org/dataset/building
    Explore at:
    arcgis geoservices rest api, geojson, kml, zip, csv, html, txt, gdb, xlsx, gpkgAvailable download formats
    Dataset updated
    Sep 7, 2018
    Dataset provided by
    Downtown Detroit Partnership
    Description

    This is a collection of layers created by Tian Xie(Intern in DDP) in August, 2018. This collection includes Detroit Parcel Data(Parcel_collector), InfoUSA business data(BIZ_INFOUSA), and building data(Building). The building and business data have been edited by Tian during field research and have attached images.

    The original source for these layers are:
    1. Business Data: InfoUSA business database purchased by DDP in 2017
    2. Building Data: Detroit Building Footprint data
    3. Parcel Data: from Detroit Open Data Portal, download in May 2018.
    For field research by Tian, some fields have been added and some records in building and business have been edited.
    1. For business data, Tian confirmed most of public assessable businesses and deleted those which do not exist. Also, Tian add new Business to the business data if it did not exist on the record.
    2. For building data, Tian recorded the total business space for each building, not-empty business space, occupancy status, parking adjacency status, and took picture for every building in downtown Detroit.
    Detail field META DATA:
    InfoUSA Business
    • OBJECTID_1
    • COMPANY_NA: company name
    • ADDRESS: company address
    • CITY: city
    • STATE: state
    • ZIP_CODE: zip code
    • MAILING_CA: source InfoUSA
    • MAILING_DE source InfoUSA
    • LOCATION_A source InfoUSA: address
    • LOCATION_1 source InfoUSA: city
    • LOCATION_2 source InfoUSA: state
    • LOCATION_3 source InfoUSA: zip code
    • LOCATION_4source InfoUSA
    • LOCATION_5 source InfoUSA
    • COUNTY: county
    • PHONE_NUMB: phone number
    • WEB_ADDRES: website address
    • LAST_NAME: contact last name
    • FIRST_NAME: contact first name
    • CONTACT_TI: contact type
    • CONTACT_PR:
    • CONTACT_GE: contact gender
    • ACTUAL_EMP: employee number
    • EMPLOYEE_S: employee number class
    • ACTUAL_SAL: actual sale
    • SALES_VOLU: sales value
    • PRIMARY_SI: primary sales value
    • PRIMARY_1: primary classification
    • SECONDARY_: secondary classification
    • SECONDARY1
    • SECONDAR_1
    • SECONDAR_2
    • CREDIT_ALP: credit level
    • CREDIT_NUM: credit number
    • HEADQUARTE: headquarte
    • YEAR_1ST_A: year open
    • OFFICE_SIZ: office size
    • SQUARE_FOO: square foot
    • FIRM_INDIV:
    • PUBLIC_PRI
    • Fleet_size
    • FRANCHISE_
    • FRANCHISE1
    • INDUSTRY_S
    • ADSIZE_IN_
    • METRO_AREA
    • INFOUSA_ID
    • LATITUDE: y
    • LONGITUDE: x
    • PARKING: parking adjacency
    • NAICS_CODE: NAICS CODE
    • NAICS_DESC: NAICS DESCRIPTION
    • parcelnum*: PARCEL NUMBER
    • parcelobji* PARCEL OBJECT ID
    • CHECK_*
    • ACCESSIABLE* PUBLIC ACCESSIBILITY
    • PROPMANAGER* PROPERTY MANAGER
    • GlobalID
    Notes: field with * means it came from other source or field research done by Tian Xie in Aug, 2018
    Building
    • OBJECTID_12
    • BUILDING_I: building id
    • PARCEL_ID : parcel id
    • BUILD_TYPE: building type
    • CITY_ID:city id
    • APN: parcel number
    • RES_SQFT: Res square feet
    • NONRES_SQF non-res square feet
    • YEAR_BUILT: year built
    • YEAR_DEMO
    • HOUSING_UN: housing units
    • STORIES: # of stories
    • MEDIAN_HGT: median height
    • CONDITION: building condition
    • HAS_CONDOS: has condos or not
    • FLAG_SQFT: flag square feet
    • FLAG_YEAR_: flag year
    • FLAG_CONDI: flag condition
    • LOADD1: address number
    • HIADD1 (type: esriFieldTypeInteger, alias: HIADD1, SQL Type: sqlTypeOther, nullable: true, editable: true)
    • STREET1: street name
    • LOADD2:
    • HIADD2 (type: esriFieldTypeString, alias: HIADD2, SQL Type: sqlTypeOther, length: 80, nullable: true, editable: true)
    • STREET2 (type: esriFieldTypeString, alias: STREET2, SQL Type: sqlTypeOther, length: 80, nullable: true, editable: true)
    • ZIPCODE: zip code
    • AKA: building name
    • USE_LOCATO
    • TEMP (type: esriFieldTypeString, alias: TEMP, SQL Type: sqlTypeOther, length: 80, nullable: true, editable: true)
    • SPID (type: esriFieldTypeInteger, alias: SPID, SQL Type: sqlTypeOther, nullable: true, editable: true)
    • Zone (type: esriFieldTypeString, alias: Zone, SQL Type: sqlTypeOther, length: 60, nullable: true, editable: true)
    • F7_2SqMile (type: esriFieldTypeString, alias: F7_2SqMile, SQL Type: sqlTypeOther, length: 10, nullable: true, editable: true)
    • Shape_Leng (type: esriFieldTypeDouble, alias: Shape_Leng, SQL Type: sqlTypeOther, nullable: true, editable: true)
    • PARKING*: parking adjacency
    • OCCUPANCY*: occupied or not
    • BuildingType* : building type
    • TotalBusinessSpace*: available business space in this building
    • NonEmptySpace*: non-empty business space in this building
    • CHECK_*
    • FOLLOWUP*: need followup or not
    • GlobalID*
    • PropmMana*: property manager
    Notes: field with * means it came from other source or field research done by Tian Xie in Aug, 2018

  12. SEPAL

    • data.amerigeoss.org
    png, wms
    Updated Oct 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Food and Agriculture Organization (2023). SEPAL [Dataset]. https://data.amerigeoss.org/dataset/sepal
    Explore at:
    png(884051), png(409262), wmsAvailable download formats
    Dataset updated
    Oct 31, 2023
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    What is SEPAL?

    SEPAL (https://sepal.io/) is a free and open source cloud computing platform for geo-spatial data access and processing. It empowers users to quickly process large amounts of data on their computer or mobile device. Users can create custom analysis ready data using freely available satellite imagery, generate and improve land use maps, analyze time series, run change detection and perform accuracy assessment and area estimation, among many other functionalities in the platform. Data can be created and analyzed for any place on Earth using SEPAL.

    https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/63a3efa0-08ab-4ad6-9d4a-96af7b6a99ec/download/cambodia_mosaic_2020.png" alt="alt text" title="Figure 1: Best pixel mosaic of Landsat 8 data for 2020 over Cambodia">

    Figure 1: Best pixel mosaic of Landsat 8 data for 2020 over Cambodia

    SEPAL reaches over 5000 users in 180 countries for the creation of custom data products from freely available satellite data. SEPAL was developed as a part of the Open Foris suite, a set of free and open source software platforms and tools that facilitate flexible and efficient data collection, analysis and reporting. SEPAL combines and integrates modern geospatial data infrastructures and supercomputing power available through Google Earth Engine and Amazon Web Services with powerful open-source data processing software, such as R, ORFEO, GDAL, Python and Jupiter Notebooks. Users can easily access the archive of satellite imagery from NASA, the European Space Agency (ESA) as well as high spatial and temporal resolution data from Planet Labs and turn such images into data that can be used for reporting and better decision making.

    National Forest Monitoring Systems in many countries have been strengthened by SEPAL, which provides technical government staff with computing resources and cutting edge technology to accurately map and monitor their forests. The platform was originally developed for monitoring forest carbon stock and stock changes for reducing emissions from deforestation and forest degradation (REDD+). The application of the tools on the platform now reach far beyond forest monitoring by providing different stakeholders access to cloud based image processing tools, remote sensing and machine learning for any application. Presently, users work on SEPAL for various applications related to land monitoring, land cover/use, land productivity, ecological zoning, ecosystem restoration monitoring, forest monitoring, near real time alerts for forest disturbances and fire, flood mapping, mapping impact of disasters, peatland rewetting status, and many others.

    The Hand-in-Hand initiative enables countries that generate data through SEPAL to disseminate their data widely through the platform and to combine their data with the numerous other datasets available through Hand-in-Hand.

    https://data.apps.fao.org/catalog/dataset/9c4d7c45-7620-44c4-b653-fbe13eb34b65/resource/868e59da-47b9-4736-93a9-f8d83f5731aa/download/probability_classification_over_zambia.png" alt="alt text" title="Figure 2: Image classification module for land monitoring and mapping. Probability classification over Zambia">

    Figure 2: Image classification module for land monitoring and mapping. Probability classification over Zambia
  13. h

    TinyDialogues

    • huggingface.co
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Feng (2024). TinyDialogues [Dataset]. https://huggingface.co/datasets/styfeng/TinyDialogues
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 28, 2024
    Authors
    Steven Feng
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for TinyDialogues

    TinyDialogues dataset collected as part of the EMNLP 2024 paper "Is Child-Directed Speech Effective Training Data for Language Models?" by Steven Y. Feng, Noah D. Goodman, and Michael C. Frank. For more details, please see Appendices A-C in our paper.

      Dataset Sources
    

    Repository: https://github.com/styfeng/TinyDialogues Paper: https://aclanthology.org/2024.emnlp-main.1231/

      Dataset Structure
    

    Final training and validation data… See the full description on the dataset page: https://huggingface.co/datasets/styfeng/TinyDialogues.

  14. FOI 30978 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Feb 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2023). FOI 30978 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-30978
    Explore at:
    Dataset updated
    Feb 6, 2023
    Dataset provided by
    NHS Business Services Authority
    Description

    Under the Freedom of Information Act 2000, I was wondering if you would be able to develop on top of the FOI Request FOI 24442 and FOI 27689. https://opendata.nhsbsa.net/dataset/foi-24442 https://opendata.nhsbsa.net/dataset/foi-27689 The data in this request relates to April 2020 to March 2022 and April 2022 to June 2022 from the data source ‘NHSBSA Information Services Data Warehouse’ with the Columns YEAR_MONTH, PRACTICE_CODE, DISPENSER_CODE, BNF_CODE, PRODUCT_ORDER_NUMBER, PACK_ORDER_NUMBER and NIC_GBP. Would it be possible to have the data in the same format from July 2022 to December 2022 or from July 2022 to the latest possible month please?

  15. A

    ‘2019 NYC Open Data Plan: Removed Datasets’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Nov 13, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘2019 NYC Open Data Plan: Removed Datasets’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-2019-nyc-open-data-plan-removed-datasets-e9b0/c6dda462/?iid=000-989&v=presentation
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘2019 NYC Open Data Plan: Removed Datasets’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/80355d19-52a3-435d-bc73-2dfb2770c3c4 on 13 November 2021.

    --- Dataset description provided by original source is as follows ---

    Datasets removed from the Open Data Plan, and an explanation why they were removed.

    --- Original source retains full ownership of the source dataset ---

  16. a

    Minnesota Parcels -- Opt-In Open Data

    • hub.arcgis.com
    • minneapolis-fire-initiative-umn.hub.arcgis.com
    Updated Oct 19, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Minnesota (2022). Minnesota Parcels -- Opt-In Open Data [Dataset]. https://hub.arcgis.com/maps/minnesota::-minnesota-parcels-opt-in-open-data
    Explore at:
    Dataset updated
    Oct 19, 2022
    Dataset authored and provided by
    State of Minnesota
    Area covered
    Description

    This is the authoritative public subset of the compiled Minnesota statewide parcel dataset. By authoritative, we mean this is the official source of statewide parcel data compiled from the counties that have opted-in to be included. Counties are the authoritative source and owner of parcel data. Quarterly, MnGeo compiles and standardizes the county data using the Minnesota Geospatial Advisory Council's parcel data standard. In the compilation process, some data content is standardized or otherwise modified (capitalization and address parsing are the most common changes). The full opt-in compiled parcel metadata record can be found on the Minnesota Geospatial Commons.To obtain the most current and authoritative data in its original form, users are referred back to the respective county. Links to each county's downloadable and/or web-viewable data, where known, are available in the accompanying spatial metadata dataset.Known limitations:Data provided by counties are often limited to a subset of fields and may not be the same fields across all counties. The fields provided by a given county may change by quarter.The USECLASS and XUSECLASS fields, while often consistent within a county, are not standardized between counties.The OWN_ADDR_# and TAX_ADDR_# fields are often populated in ways not consistent with the standard. In particular, an address number/street address may not be in Line 1, and city/state/zip cannot be relied on to be in Line 3. Even within a single county, the city/state/zip line may not be in a consistent field.Parcels with addresses on fractional streets (5-1/2th Ave) cause issues for our address parser when parsing is needed for aggregation and may be missing some or all of the address data. Certain other oddly named streets can also cause this behavior.A maximum record count has been set on the mapping service. This limits the number of features that can be returned in a single request. It is set to balance usability and response time.

  17. Oakland PD Academy & Field Training Attrition Rate

    • kaggle.com
    Updated Dec 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Oakland (2019). Oakland PD Academy & Field Training Attrition Rate [Dataset]. https://www.kaggle.com/datasets/cityofoakland/oakland-pd-academy-field-training-attrition-rate/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    City of Oakland
    Area covered
    Oakland
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset hosted by the city of Oakland in California. The organization has an open data platform found here and they update their information according to the amount of data that is brought in. Explore Oakland's Data using Kaggle and all of the data sources available through the city of Oakland organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    Cover photo by Sarah Brink on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

    This dataset is distributed under the following licenses: NA

  18. v

    New centralized heat production source zone - Dataset - Vilnius Open Data...

    • opendata.vilnius.lt
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). New centralized heat production source zone - Dataset - Vilnius Open Data portal [Dataset]. https://opendata.vilnius.lt/dataset/new-centralized-heat-production-source-zone
    Explore at:
    Dataset updated
    Sep 30, 2024
    Area covered
    Vilnius
    Description

    The dataset contains the 2021 engineering infrastructure solutions from the Vilnius city general plan – zone for new centralized heat production sources.

  19. TOPIC Open - Empirical Module

    • ouvert.canada.ca
    • open.canada.ca
    html
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada (2024). TOPIC Open - Empirical Module [Dataset]. https://ouvert.canada.ca/data/dataset/85a43ef9-88b1-40eb-b612-d8b27a5763b4
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1900 - Jan 1, 2021
    Description

    The Traits of Plants in Canada (TOPIC) open access empirical measurements module provides access to an open dataset of direct observations collected in the field, laboratory, greenhouse or garden. The Traits of Plants in Canada (TOPIC) database serves as a hub for centralizing knowledge on plant functional traits in Canada. Under the Canadian Trait Network, this database allows the integration of trait data from large, disconnected scientific sources to facilitate research on plant and forest ecology, community ecology and forest sustainability. Following international standards, the database ensures that the datasets are properly documented and archived, facilitating their re-use and discoverability. **Please cite TOPIC open as follows: ** Aubin, I., Boisvert-Marsh, L., Munson, A.D. 2021. Traits of plants in Canada (TOPIC) Open access - Traits des plantes au Canada (TOPIC) ouvert. doi: https://doi.org/10.23687/bb14c6bf-75f7-4ff2-b97e-689fa768905c

    **And TOPIC as follows: ** Aubin, I, Cardou, F., Boisvert‐Marsh, L., Garnier, E., Strukelj, M, Munson, A.D. 2020. Managing data locally to answer questions globally: The role of collaborative science in ecology. Journal of Vegetation Science. 31: 509–517.

  20. O

    Asset Inventory: Austin Open Data Portal

    • data.austintexas.gov
    • datahub.austintexas.gov
    application/rdfxml +5
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Asset Inventory: Austin Open Data Portal [Dataset]. https://data.austintexas.gov/City-Government/Asset-Inventory-Austin-Open-Data-Portal/my8q-n4hf
    Explore at:
    csv, tsv, application/rssxml, application/rdfxml, json, xmlAvailable download formats
    Dataset updated
    Jul 31, 2025
    Description

    This asset is a derived view based on the system dataset 'Site Analytics: Asset Inventory' which is automatically generated by the data management platform and provides a comprehensive inventory of all assets on this site. This asset has been filtered to present an overview of the various types of data that are classified as public and have been published on the City of Austin Open Data Portal (data.austintexas.gov) by departmental data owners.

    The columns of the Asset Inventory dataset contain information about every asset. These include metadata fields (e.g., Name, Description, and Category), as well as statistics, such as the number of visits, row count, column count, and downloads. This asset is updated at least once per day to sync any changes, additional assets, or removed assets.

    Data provided by: Tyler Technologies Creation date of data source: November 1, 2022

    *City of Austin Open Data Terms of Use – https://data.austintexas.gov/stories/s/ranj-cccq

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
City of Tempe (2025). Addresses (Open Data) [Dataset]. https://catalog.data.gov/dataset/addresses-open-data

Addresses (Open Data)

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 26, 2025
Dataset provided by
City of Tempe
Description

This dataset is a compilation of address point data for the City of Tempe. The dataset contains a point location, the official address (as defined by The Building Safety Division of Community Development) for all occupiable units and any other official addresses in the City. There are several additional attributes that may be populated for an address, but they may not be populated for every address. Contact: Lynn Flaaen-Hanna, Development Services Specialist Contact E-mail Link: Map that Lets You Explore and Export Address Data Data Source: The initial dataset was created by combining several datasets and then reviewing the information to remove duplicates and identify errors. This published dataset is the system of record for Tempe addresses going forward, with the address information being created and maintained by The Building Safety Division of Community Development.Data Source Type: ESRI ArcGIS Enterprise GeodatabasePreparation Method: N/APublish Frequency: WeeklyPublish Method: AutomaticData Dictionary

Search
Clear search
Close search
Google apps
Main menu