79 datasets found
  1. Open Data Portal Catalogue

    • open.canada.ca
    • datasets.ai
    • +3more
    csv, json, jsonl, png +2
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
    Explore at:
    csv, sqlite, json, png, jsonl, xlsxAvailable download formats
    Dataset updated
    Aug 27, 2025
    Dataset provided by
    Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
    Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

  2. Z

    Data from: MUHSIC: An Open Dataset with Temporal Musical Success Information...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel R. G. Barbosa (2021). MUHSIC: An Open Dataset with Temporal Musical Success Information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4779002
    Explore at:
    Dataset updated
    Oct 22, 2021
    Dataset provided by
    Anisio Lacerda
    Gabriel P. Oliveira
    Danilo B. Seufitelli
    Mirella M. Moro
    Bruna C. Melo
    Mariana O. Silva
    Gabriel R. G. Barbosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Music is a volatile industry, where its dynamic nature can directly influence artist career behavior. That is, musical careers can suffer ups and downs depending on the current market moment. This dataset provides data about hot streak periods in musical careers, which are defined by high-impact bursts occurring in sequence.

    Success in the music industry has a temporal structure, as the audience tastes change over time. Here, we use the Billboard Hot 100 charts with Spotify data to represent success over time. For musical careers, we build their time series from the debut date (i.e., date of the first release obtained from Spotify) to the last chart collected. Thus, each point in the time series represents the success of such an artist in a given week, according to the Hot 100 chart.

    Therefore, we present MUHSIC (Music-oriented Hot Streak Information Collection), which contains:

    Charts: enhanced data on all weekly Hot 100 Charts

    Artists: artist success time series with hot streak information

    Genres: genre success time series with hot streak information (the genre is the aggregated of all its artists)

    Hot Streaks: summarized hot streak information

  3. LSD4WSD : An Open Dataset for Wet Snow Detection with SAR Data and Physical...

    • zenodo.org
    • explore.openaire.eu
    bin, pdf +1
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthieu Gallet; Matthieu Gallet; Abdourrahmane Atto; Abdourrahmane Atto; Fatima Karbou; Fatima Karbou; Emmanuel Trouvé; Emmanuel Trouvé (2024). LSD4WSD : An Open Dataset for Wet Snow Detection with SAR Data and Physical Labelling [Dataset]. http://doi.org/10.5281/zenodo.10046730
    Explore at:
    text/x-python, bin, pdfAvailable download formats
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matthieu Gallet; Matthieu Gallet; Abdourrahmane Atto; Abdourrahmane Atto; Fatima Karbou; Fatima Karbou; Emmanuel Trouvé; Emmanuel Trouvé
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LSD4WSD V2.0

    Learning SAR Dataset for Wet Snow Detection - Full Analysis Version.

    The aim of this dataset is to provide a basis for automatic learning to detect wet snow. It is based on Sentinel-1 SAR GRD satellite images acquired between August 2020 and August 2021 over the French Alps. The new version of this dataset is no longer simply restricted to a classification task, and provides a set of metadata for each sample.

    Modification and improvements of the version 2.0.0 :

    • Number of massif: add 7 new massif to cover the all Sentinel-1 images (cf info.pdf).
    • Acquisition: add images of the descending pass in addition to those originally used in the ascending pass.
    • Sample: reduction in the size of the samples considered to 15 by 15 to facilitate evaluation at the central pixel.
    • Sample: increased density of extracted windows, with a distance of approximately 500 meters between the centers of the windows.
    • Sample: removal of the pre-processing involving the use of logarithms.
    • Sample: removal of the pre-processing involving the normalisation.
    • Labels: new structure for the labels part: dictionary with keys: topography, metadata and physics.
    • Labels: physics: addition of direct information from the CROCUS model for 3 simulations: Liquid Water Content, snow height and minimum snowpack temperature.
    • Labels: topography: information on the slope, altitude and average orientation of the sample.
    • Labels: metadata : information on the date of the sample, the mountain massif and the run (ascending or descending).
    • Dataset: removal of the train/test split*

    We leave it up to the user to use the Group Kfold method to validate the models using the alpine massif information.

    Finally, it consists of 2467516 samples of size 15 by 15 by 9. For each sample, the 9 metadata are provided, using in particular the Crocus physical model:

    • topography:
      • elevation (meters) (average),
      • orientation (degrees) (average),
      • slope (degrees) (average),
    • metadata:
      • name of the alpine massif,
      • date of acquisition,
      • type of acquisition (ascending/descending),
    • physics
      • Liquid Water Content (km/m2),
      • snow height (m),
      • minimum snowpack temperature (Celsius degree).

    The 9 channels are in the following order:

    • Sentinel-1 polarimetric channels: VV, VH and the combination C: VV/VH in linear,
    • Topographical features: altitude, orientation, slope
    • Polarimetric ratio with a reference summer image: VV/VVref, VH/VHref, C/Cref

    * The reference image selected is that of August 9th 2020, as a reference image without snow (cf. Nagler&al)

    An overview of the distribution and a summary of the sample statistics can be found in the file info.pdf.

    The data is stored in .hdf5 format with gzip compression. We provide a python script to read and request the data. The script is dataset_load.py. It is based on the h5py, numpy and pandas libraries. It allows to select a part or the whole dataset using requests on the metadata. The script is documented and can be used as described in the README.md file

    The processing chain is available at the following Github address.

    The authors would like to acknowledge the support from the National Centre for Space Studies (CNES) in providing computing facilities and access to SAR images via the PEPS platform.

    The authors would like to deeply thank Mathieu Fructus for running the Crocus simulations.

    Erratum :

    In the dataloader file, the name of the "aquisition" column must be added twice, see the correction below.:

    dtst_ld = Dataset_loader(path_dataset,shuffle=False,descrp=["date","massif","aquisition","aquisition","elevation","slope","orientation","tmin","hsnow","tel",],)

    If you have any comments, questions or suggestions, please contact the authors:

    • matthieu.gallet@univ-smb.fr
    • fatima.karbou@meteo.fr
    • abdourrahmane.atto@univ-smb.fr
    • emmanuel.trouve@univ-smb.fr

  4. Forensic DNA Open Dataset

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). Forensic DNA Open Dataset [Dataset]. https://catalog.data.gov/dataset/forensic-dna-open-dataset-a26bc
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This dataset consists of single source and mixture samples which were genotyped/sequenced with kits targeting Forensic DNA markers. More information specific to the kit and or method used can be found in the README text files included in each zipped file.The CE-STR kits reported for the single source samples include: Applied Biosystems GlobalFiler, Applied Biosystems Y-Filer Plus, Promega PowerPlex Fusion 6C, Promega PowerPlex Y23The CE profiles for single source samples are also included in a spreadsheet.The following CE-STR kit is reported for the mixture samples: Promega PowerPlex Fusion 6CThe sequencing kits reported for the mixture and single source samples include: Verogen ForenSeq DNA Signature Prep Kit, Promega PowerSeq 46GY, Thermo Fisher Applied Biosystems Precision ID GlobalFiler NGS STR Panel v2The single source samples only are reported for: Promega PowerSeq CRM Nested SystemThis data was produced with approval from the NIST Research Protections Office. It is intended for research, training, and educational purposes only and could potentially contain errors due to limited review prior to uploading. This data should not be used to identify the donor of the profile or uploaded/searched versus public or law enforcement DNA databases. Certain commercial equipment, instruments, or materials are identified in this dataset in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

  5. Global Biodiversity Information Facility (GBIF) Species Occurrences

    • registry.opendata.aws
    Updated May 17, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Global Biodiversity Information Facility (GBIF) (2021). Global Biodiversity Information Facility (GBIF) Species Occurrences [Dataset]. https://registry.opendata.aws/gbif/
    Explore at:
    Dataset updated
    May 17, 2021
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Description

    The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.

  6. Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
    Explore at:
    zip, binAvailable download formats
    Dataset updated
    Jun 7, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

    The attractive features of MusicOSet include:

    • Integration and centralization of different musical data sources
    • Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018
    • Enriched metadata for music, artists, and albums from the US popular music industry
    • Availability of acoustic and lyrical resources
    • Unrestricted access in two formats: SQL database and compressed .csv files
    |    Data    | # Records |
    |:-----------------:|:---------:|
    | Songs       | 20,405  |
    | Artists      | 11,518  |
    | Albums      | 26,522  |
    | Lyrics      | 19,664  |
    | Acoustic Features | 20,405  |
    | Genres      | 1,561   |
  7. Regulatory information for cosmetics

    • open.canada.ca
    • datasets.ai
    • +1more
    html
    Updated Mar 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health Canada (2021). Regulatory information for cosmetics [Dataset]. https://open.canada.ca/data/en/dataset/0945ce45-411e-4ed2-8ccc-e4f7d0840f9f
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 1, 2021
    Dataset provided by
    Health Canadahttp://www.hc-sc.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    All cosmetics sold in Canada must be safe to use and must not pose any health risk. They must meet the requirements of the Food and Drugs Act and the Cosmetic Regulations.

  8. Annual Freedom of Information Act (FOIA) Reports - Dataset - NASA Open Data...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Annual Freedom of Information Act (FOIA) Reports - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/annual-freedom-of-information-act-foia-reports
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    NASA makes annual reports of progress made on Freedom of Information Act (FOIA) requests. This database contains PDF and XML versions of reports from 1999 to the present.

  9. a

    Published Open Data Sets

    • hub.arcgis.com
    • data.squamish.ca
    • +1more
    Updated Jul 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    District of Squamish (2022). Published Open Data Sets [Dataset]. https://hub.arcgis.com/maps/squamish::published-open-data-sets
    Explore at:
    Dataset updated
    Jul 20, 2022
    Dataset authored and provided by
    District of Squamish
    Area covered
    Description

    Published Open Data Sets | Squamish Community DashboardThis measure tracks the number of public open data sets published online for community use as part of the District's Open Data Portal. The Squamish Open Data Portal (data.squamish.ca) provides full GIS data sets including but not limited to physical terrain and imagery, environment, infrastructure, business data, development, recreation, transportation, and emergency management. Open data is commonly defined as data that is free and available for anyone to use and republish as they wish.About this target:Available open data, and corresponding public visitation and usage of that data, progressively increases (Squamish Community Digital Strategy).Analysis:As of year-end 2024, the District's open data portal had 79 published data sets. Since launching the District's Open Data Portal in 2016, the municipality has added 44 data sets to the number initially published (35 data sets). Reason for monitoring:Enhancing access to and utilization of information contributes to open and transparent government, promotes a more connected and engaged community and strengthens decision making and service delivery. These are core goals of the Community Digital Strategy developed and adopted in 2016 to better leverage technology to meet the growing social, economic and environmental needs of citizens, and link digital products and services to wider community and economic development.

  10. d

    Open dataset of annual Article Processing Charges (APCs) of gold and hybrid...

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Butler, Leigh-Ann; Hare, Madelaine; Schönfelder, Nina; Schares, Eric; Alperin, Juan Pablo; Haustein, Stefanie (2024). Open dataset of annual Article Processing Charges (APCs) of gold and hybrid journals published by Elsevier, Frontiers, MDPI, PLOS, Springer-Nature and Wiley 2019-2023 [Dataset]. http://doi.org/10.7910/DVN/CR1MMV
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Butler, Leigh-Ann; Hare, Madelaine; Schönfelder, Nina; Schares, Eric; Alperin, Juan Pablo; Haustein, Stefanie
    Description

    This open dataset of annual Article Processing Charges (APCs) was produced from the price lists of six large scholarly publishers (Elsevier, Frontiers, PLOS, MDPI, Springer-Nature and Wiley) from 2019 to 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve fees per journal per year. The dataset includes journal metadata, APC collection method, and annual APC list prices in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals and 36,618 journal-year combinations. The dataset was generated to allow for more precise analysis of APCs and can support library collection development and scientometric analysis estimating APCs paid in gold and hybrid OA journals.

  11. C

    City of Milwaukee Open Data Dataset Catalog

    • data.milwaukee.gov
    csv
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Information Technology and Management Division (2025). City of Milwaukee Open Data Dataset Catalog [Dataset]. https://data.milwaukee.gov/dataset/dataset-catalog
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Information Technology and Management Division
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Milwaukee
    Description

    This dataset is a catalog of all the datasets available on the City of Milwaukee Open Data portal.

  12. KU-HAR: Human Activity Recognition Dataset (v 1.0)

    • kaggle.com
    Updated Apr 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niloy Sikder (2021). KU-HAR: Human Activity Recognition Dataset (v 1.0) [Dataset]. https://www.kaggle.com/datasets/niloy333/kuhar
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Niloy Sikder
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    KU-HAR: An Open Dataset for Human Activity Recognition (v 1.0)

    Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them.

    Activities/ Classes

    1. Stand ➞ Standing still (1 min)
    2. Sit ➞ Sitting still (1 min)
    3. Talk-sit ➞ Talking with hand movements while sitting (1 min)
    4. Talk-stand ➞ Talking with hand movements while standing or walking(1 min)
    5. Stand-sit ➞ Repeatedly standing up and sitting down (5 times)
    6. Lay ➞ Laying still (1 min)
    7. Lay-stand ➞ Repeatedly standing up and laying down (5 times)
    8. Pick ➞ Picking up an object from the floor (10 times)
    9. Jump ➞ Jumping repeatedly (10 times)
    10. Push-up ➞ Performing full push-ups (5 times)
    11. Sit-up ➞ Performing sit-ups (5 times)
    12. Walk ➞ Walking 20 meters (≈12 s)
    13. Walk-backward ➞ Walking backward for 20 meters (≈20 s)
    14. Walk-circle ➞ Walking along a circular path (≈ 20 s)
    15. Run ➞ Running 20 meters (≈7 s)
    16. Stair-up ➞ Ascending on a set of stairs (≈1 min)
    17. Stair-down ➞ Descending from a set of stairs (≈50 s)
    18. Table-tennis ➞ Playing table tennis (1 min)

    Contents of the .zip files

    1.Raw_ time_ domian_ data.zip ➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5 ➞ exact time (elapsed since the start) when the Accelerometer (col. 1) & Gyroscope (col. 5) output were recorded (in ms) Col. 2, 3, 4 ➞ Acceleration along X, Y, Z axes (in m/s^2) Col. 6, 7, 8 ➞ Rate of rotation around X, Y, Z axes (in rad/s)

    2.Trimmed_ interpolated_ raw_ data.zip ➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

    3.Time_ domain_ subsamples.zip ➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900 ➞ Accelerometer X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800 ➞ Gyro X, Y, Z axes readings Col. 1801 ➞ Class ID (0 to 17, in the order mentioned above) Col. 1802 ➞ length of each channel data in the subsample Col. 1803 ➞ serial no. of the subsample

    Gravity acceleration was omitted from the Accelerometer data, and no filter was applied to remove noise. The dataset is free to download, modify, and use provided that the source and the associated article are properly referenced.

    Use the .csv file of the Time_ domain_ subsamples.zip for instant HAR classification tasks. See this notebook for details. Use the other files if you want to work with raw activity data.

    Citation Request

    More information is provided in the following data paper. Please cite it if you use this dataset in your research/work: [1] N. Sikder and A.-A. Nahid, “**KU-HAR: An open dataset for heterogeneous human activity recognition**,” Pattern Recognition Letters, vol. 146, pp. 46–54, Jun. 2021, doi: 10.1016/j.patrec.2021.02.024

    [2] N. Sikder, M. A. R. Ahad, and A.-A. Nahid, “Human Action Recognition Based on a Sequential Deep Learning Model,” 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE, Aug. 16, 2021. doi: 10.1109/icievicivpr52578.2021.9564234.

    Cite the dataset as: A.-A. Nahid, N. Sikder, and I. Rafi, “KU-HAR: An Open Dataset for Human Activity Recognition.” Mendeley, Feb. 16, 2021, doi: 10.17632/45F952Y38R.5

    Supplementary files: https://drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7

    Conclusion

    The dataset is originally hosted on Mendeley Data

    The image used in the banner is collected from here and attributed as: Fit, athletic man getting ready for a run by Jacob Lund from Noun Projects

  13. A New Dataset for Streaming Learning Analytics

    • zenodo.org
    csv
    Updated Oct 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gianluca Zaza; Gianluca Zaza (2024). A New Dataset for Streaming Learning Analytics [Dataset]. http://doi.org/10.5281/zenodo.14003233
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gianluca Zaza; Gianluca Zaza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 2024
    Description

    This research introduces a novel dataset developed for streaming learning analytics, derived from the Open University Learning Analytics Dataset (OULAD). The dataset incorporates essential temporal information that captures the timing of student interactions with the Virtual Learning Environment (VLE). By integrating these time-based interactions, the dataset enhances the capabilities of stream algorithms, which are particularly well-suited for real-time monitoring and analysis of student learning behaviors.

    The dataset consists of 34 features and 1,718,983 samples, encompassing students' demographic information, assessment scores, and interactions with the VLE for a specific time ( T ), corresponding to each student ( S ) within a given course ( C ) and module ( M ). The target classes—'Withdrawn', 'Fail', 'Pass', and 'Distinction'—were encoded as 0, 1, 2, and 3, respectively. Notably, the data exhibits a significant imbalance, with a substantial prevalence of records associated with students who passed the final examination. The class distribution is as follows: 'Pass' (1,022,760 samples), 'Distinction' (308,642 samples), 'Fail' (227,550$ samples), and 'Withdrawn' (160,031 samples).

    For further details on the data, please refer to the manuscript: Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Proceedings

  14. S

    Data and code for "An Open Dataset of Chinese Duration Expressions"

    • scidb.cn
    Updated Aug 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang Si-Qi; Niu Jia-Wen; Liu Xiaoqian; Sui Xiao-Yang; Rao Li-Lin (2025). Data and code for "An Open Dataset of Chinese Duration Expressions" [Dataset]. http://doi.org/10.57760/sciencedb.28888
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 7, 2025
    Dataset provided by
    Science Data Bank
    Authors
    Zhang Si-Qi; Niu Jia-Wen; Liu Xiaoqian; Sui Xiao-Yang; Rao Li-Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset comprises the data and the code for the manuscript "An Open Dataset of Chinese Duration Expressions".Duration information is essential for understanding and analyzing our world. In textual contexts, duration information is typically conveyed in two formats: numeric (e.g., 1 hour) and verbal (e.g., shortly). To analyze duration information in text, it is crucial to understand how people map duration expressions to corresponding numerical duration. However, the literature has yet to provide lexicons supporting such conversion. Furthermore, existing databases of time-related expressions often lack information about word frequency – a robust predictor of information processing. Here, we report an open dataset of 2,101 Chinese duration expressions, each annotated with its corresponding numerical duration. To obtain high-quality data for word frequency, we obtained the frequency of each duration expression from a large-scale corpus of 10 billion Chinese characters (BLCU Corpus Center (BCC) Corpus) and computed an adjusted frequency for each expression. This dataset provides a valuable resource for research on temporal information in Chinese, facilitating studies in natural language processing, psychology, and linguistics.

  15. e

    Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO

    • energydata.info
    Updated Jul 25, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/global-roads-open-access-data-set-2010
    Explore at:
    Dataset updated
    Jul 25, 2018
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.

  16. r

    Data from: OPEN-KTH-3dMODELS: An Open Dataset of Building Models at KTH...

    • researchdata.se
    • explore.openaire.eu
    Updated Dec 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Naveen Mohan; Maxime Sainte Catherine; Lester Jose (2023). OPEN-KTH-3dMODELS: An Open Dataset of Building Models at KTH Campus Valhallavägen [Dataset]. http://doi.org/10.5281/ZENODO.10445868
    Explore at:
    Dataset updated
    Dec 31, 2023
    Dataset provided by
    KTH Royal Institute of Technology
    Authors
    Naveen Mohan; Maxime Sainte Catherine; Lester Jose
    Area covered
    Valhallavägen
    Description

    OPEN-KTH-3dMODELS: An open dataset of building models at KTH Campus Valhallavägen

    Open-KTH-3dModels is a subproject of the AD-EYE testbed for Automated Driving and Intelligent Transportation Systems.

    The dataset comprises of a series .blend files that have prominent buildings from KTH campus Valhallavägen.

    The dataset also contains PreScan compatible models that can be used wtih AD-EYE (https://www.adeye.se/)

    Visualisation video: https://www.youtube.com/watch?v=F6NfCiul3oELearn more at https://www.adeye.se/open-kth-3dmodels or contact adeye@md.kth.se

    The AD-EYE testbed is based on the design presented in the work "AD-EYE: A Co-Simulation Platform for Early Verification of Functional Safety Concepts"

    Original paper: https://doi.org/10.4271/2019-01-0126

    Preprint available at: https://arxiv.org/abs/1912.00448

    Citation:

    Naveen Mohan, Martin Törngren, "AD-EYE: A Co-Simulation Platform for Early Verification of Functional Safety Concepts", SAE Technical Paper 19AE-0203/2019-01-0126, https://doi.org/10.4271/2019-01-0126

    Notes:

    Modelling work primarily performed by Lester Jose, during his internship with AD-EYE.

  17. m

    KU-HAR: An Open Dataset for Human Activity Recognition

    • data.mendeley.com
    Updated Feb 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdullah-Al Nahid (2021). KU-HAR: An Open Dataset for Human Activity Recognition [Dataset]. http://doi.org/10.17632/45f952y38r.5
    Explore at:
    Dataset updated
    Feb 16, 2021
    Authors
    Abdullah-Al Nahid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    (Always use the latest version of the dataset. )

    Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them. The activities are:

    Stand➞ Standing still (1 min) Sit➞ Sitting still (1 min) Talk-sit➞ Talking with hand movements while sitting (1 min) Talk-stand➞ Talking with hand movements while standing or walking(1 min) Stand-sit➞ Repeatedly standing up and sitting down (5 times) Lay➞ Laying still (1 min) Lay-stand➞ Repeatedly standing up and laying down (5 times) Pick➞ Picking up an object from the floor (10 times) Jump➞ Jumping repeatedly (10 times) Push-up➞ Performing full push-ups (5 times) Sit-up➞ Performing sit-ups (5 times) Walk➞ Walking 20 meters (≈12 s) Walk-backward➞ Walking backward for 20 meters (≈20 s) Walk-circle➞ Walking along a circular path (≈ 20 s) Run➞ Running 20 meters (≈7 s) Stair-up➞ Ascending on a set of stairs (≈1 min) Stair-down➞ Descending from a set of stairs (≈50 s) Table-tennis➞ Playing table tennis (1 min)

    Contents of the attached .zip files are: 1.Raw_time_domian_data.zip➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5➞ exact time (elapsed since the start) when the Accelerometer & Gyro output was recorded (in ms) Col. 2, 3, 4➞ Acceleration along X,Y,Z axes (in m/s^2) Col. 6, 7, 8➞ Rate of rotation around X,Y,Z axes (in rad/s)

    2.Trimmed_interpolated_raw_data.zip➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

    3.Time_domain_subsamples.zip➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900➞ Acc.meter X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800➞ Gyro X, Y, Z axes readings Col. 1801➞ Class ID (0 to 17, in the order mentioned above) Col. 1802➞ length of the each channel data in the subsample Col. 1803➞ serial no. of the subsample

    Gravity acceleration was omitted from the Acc.meter data, and no filter was applied to remove noise. The dataset is free to download, modify, and use.

    More information is provided in the data paper which is currently under review: N. Sikder, A.-A. Nahid, KU-HAR: An open dataset for heterogeneous human activity recognition, Pattern Recognit. Lett. (submitted).

    A preprint will be available soon.

    Backup: drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7

  18. o

    Freedom of Information data and statistics - Datasets - Government of Jersey...

    • opendata.gov.je
    Updated Feb 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Freedom of Information data and statistics - Datasets - Government of Jersey Open Data [Dataset]. https://opendata.gov.je/dataset/freedom-of-information-data-and-statistics
    Explore at:
    Dataset updated
    Feb 12, 2020
    License
    Description

    This dataset includes the total valid Freedom of Information (FOI) requests received, the volume of Departments' FOI requests and responses, who's making the most FOI requests and common topics. The resources within this dataset are updated monthly. Read more at https://www.gov.je/Government/FreedomOfInformation/Pages/FOIStatistics.aspx

  19. d

    AI TOOLS - Open Dataset - 4000 tools / 50 categories

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BUREAU, Olivier (2023). AI TOOLS - Open Dataset - 4000 tools / 50 categories [Dataset]. http://doi.org/10.7910/DVN/QLSXZG
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    BUREAU, Olivier
    Description

    Introducing a comprehensive and openly accessible dataset designed for researchers and data scientists in the field of artificial intelligence. This dataset encompasses a collection of over 4,000 AI tools, meticulously categorized into more than 50 distinct categories. This valuable resource has been generously shared by its owner, TasticAI, and is freely available for various purposes such as research, benchmarking, market surveys, and more. Dataset Overview: The dataset provides an extensive repository of AI tools, each accompanied by a wealth of information to facilitate your research endeavors. Here is a brief overview of the key components: AI Tool Name: Each AI tool is listed with its name, providing an easy reference point for users to identify specific tools within the dataset. Description: A concise one-line description is provided for each AI tool. This description offers a quick glimpse into the tool's purpose and functionality. AI Tool Category: The dataset is thoughtfully organized into more than 50 distinct categories, ensuring that you can easily locate AI tools that align with your research interests or project needs. Whether you are working on natural language processing, computer vision, machine learning, or other AI subfields, you will find a dedicated category. Images: Visual representation is crucial for understanding and identifying AI tools. To aid your exploration, the dataset includes images associated with each tool, allowing for quick recognition and visual association. Website Links: Accessing more detailed information about a specific AI tool is effortless, as direct links to the tool's respective website or documentation are provided. This feature enables researchers and data scientists to delve deeper into the tools that pique their interest. Utilization and Benefits: This openly shared dataset serves as a valuable resource for various purposes: Research: Researchers can use this dataset to identify AI tools relevant to their studies, facilitating faster literature reviews, comparative analyses, and the exploration of cutting-edge technologies. Benchmarking: The extensive collection of AI tools allows for comprehensive benchmarking, enabling you to evaluate and compare tools within specific categories or across categories. Market Surveys: Data scientists and market analysts can utilize this dataset to gain insights into the AI tool landscape, helping them identify emerging trends and opportunities within the AI market. Educational Purposes: Educators and students can leverage this dataset for teaching and learning about AI tools, their applications, and the categorization of AI technologies. Conclusion: In summary, this openly shared dataset from TasticAI, featuring over 4,000 AI tools categorized into more than 50 categories, represents a valuable asset for researchers, data scientists, and anyone interested in the field of artificial intelligence. Its easy accessibility, detailed information, and versatile applications make it an indispensable resource for advancing AI research, benchmarking, market analysis, and more. Explore the dataset at https://tasticai.com and unlock the potential of this rich collection of AI tools for your projects and studies.

  20. o

    Information and Computer Skilled Level - Dataset - Open Government Data

    • opendata.gov.jo
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Information and Computer Skilled Level - Dataset - Open Government Data [Dataset]. https://opendata.gov.jo/dataset/information-and-computer-skilled-level-3595-2022
    Explore at:
    Dataset updated
    Dec 27, 2024
    Description

    Information and Computer Skilled Level [2022] Training Programs Related to Information and Computer within the Skilled Level

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Organization logoOrganization logo

Open Data Portal Catalogue

Explore at:
7 scholarly articles cite this dataset (View in Google Scholar)
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Aug 27, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License

Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically

Description

The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

Search
Clear search
Close search
Google apps
Main menu