79 datasets found

Open Data Portal Catalogue
open.canada.ca
datasets.ai
+3more
csv, json, jsonl, png +2
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7
Explore at:
csv, sqlite, json, png, jsonl, xlsxAvailable download formats
Dataset updated
Aug 27, 2025
Dataset provided by
Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.
Z
Data from: MUHSIC: An Open Dataset with Temporal Musical Success Information...
data.niaid.nih.gov
zenodo.org
Updated Oct 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel R. G. Barbosa (2021). MUHSIC: An Open Dataset with Temporal Musical Success Information [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4779002
Explore at:
Dataset updated
Oct 22, 2021
Dataset provided by
Anisio Lacerda
Gabriel P. Oliveira
Danilo B. Seufitelli
Mirella M. Moro
Bruna C. Melo
Mariana O. Silva
Gabriel R. G. Barbosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Music is a volatile industry, where its dynamic nature can directly influence artist career behavior. That is, musical careers can suffer ups and downs depending on the current market moment. This dataset provides data about hot streak periods in musical careers, which are defined by high-impact bursts occurring in sequence.

Success in the music industry has a temporal structure, as the audience tastes change over time. Here, we use the Billboard Hot 100 charts with Spotify data to represent success over time. For musical careers, we build their time series from the debut date (i.e., date of the first release obtained from Spotify) to the last chart collected. Thus, each point in the time series represents the success of such an artist in a given week, according to the Hot 100 chart.

Therefore, we present MUHSIC (Music-oriented Hot Streak Information Collection), which contains:

Charts: enhanced data on all weekly Hot 100 Charts

Artists: artist success time series with hot streak information

Genres: genre success time series with hot streak information (the genre is the aggregated of all its artists)

Hot Streaks: summarized hot streak information
LSD4WSD : An Open Dataset for Wet Snow Detection with SAR Data and Physical...
zenodo.org
explore.openaire.eu
bin, pdf +1
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthieu Gallet; Matthieu Gallet; Abdourrahmane Atto; Abdourrahmane Atto; Fatima Karbou; Fatima Karbou; Emmanuel Trouvé; Emmanuel Trouvé (2024). LSD4WSD : An Open Dataset for Wet Snow Detection with SAR Data and Physical Labelling [Dataset]. http://doi.org/10.5281/zenodo.10046730
Explore at:
text/x-python, bin, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10046730
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Matthieu Gallet; Matthieu Gallet; Abdourrahmane Atto; Abdourrahmane Atto; Fatima Karbou; Fatima Karbou; Emmanuel Trouvé; Emmanuel Trouvé
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LSD4WSD V2.0
Learning SAR Dataset for Wet Snow Detection - Full Analysis Version.
The aim of this dataset is to provide a basis for automatic learning to detect wet snow. It is based on Sentinel-1 SAR GRD satellite images acquired between August 2020 and August 2021 over the French Alps. The new version of this dataset is no longer simply restricted to a classification task, and provides a set of metadata for each sample.
Modification and improvements of the version 2.0.0 :
Number of massif: add 7 new massif to cover the all Sentinel-1 images (cf info.pdf).
Acquisition: add images of the descending pass in addition to those originally used in the ascending pass.
Sample: reduction in the size of the samples considered to 15 by 15 to facilitate evaluation at the central pixel.
Sample: increased density of extracted windows, with a distance of approximately 500 meters between the centers of the windows.
Sample: removal of the pre-processing involving the use of logarithms.
Sample: removal of the pre-processing involving the normalisation.
Labels: new structure for the labels part: dictionary with keys: topography, metadata and physics.
Labels: physics: addition of direct information from the CROCUS model for 3 simulations: Liquid Water Content, snow height and minimum snowpack temperature.
Labels: topography: information on the slope, altitude and average orientation of the sample.
Labels: metadata : information on the date of the sample, the mountain massif and the run (ascending or descending).
Dataset: removal of the train/test split*
We leave it up to the user to use the Group Kfold method to validate the models using the alpine massif information.
Finally, it consists of 2467516 samples of size 15 by 15 by 9. For each sample, the 9 metadata are provided, using in particular the Crocus physical model:
topography:
elevation (meters) (average),
orientation (degrees) (average),
slope (degrees) (average),
metadata:
name of the alpine massif,
date of acquisition,
type of acquisition (ascending/descending),
physics
Liquid Water Content (km/m2),
snow height (m),
minimum snowpack temperature (Celsius degree).
The 9 channels are in the following order:
Sentinel-1 polarimetric channels: VV, VH and the combination C: VV/VH in linear,
Topographical features: altitude, orientation, slope
Polarimetric ratio with a reference summer image: VV/VVref, VH/VHref, C/Cref
* The reference image selected is that of August 9th 2020, as a reference image without snow (cf. Nagler&al)
An overview of the distribution and a summary of the sample statistics can be found in the file info.pdf.
The data is stored in .hdf5 format with gzip compression. We provide a python script to read and request the data. The script is dataset_load.py. It is based on the h5py, numpy and pandas libraries. It allows to select a part or the whole dataset using requests on the metadata. The script is documented and can be used as described in the README.md file
The processing chain is available at the following Github address.
The authors would like to acknowledge the support from the National Centre for Space Studies (CNES) in providing computing facilities and access to SAR images via the PEPS platform.
The authors would like to deeply thank Mathieu Fructus for running the Crocus simulations.
Erratum :
In the dataloader file, the name of the "aquisition" column must be added twice, see the correction below.:
dtst_ld = Dataset_loader(path_dataset,shuffle=False,descrp=["date","massif","aquisition","aquisition","elevation","slope","orientation","tmin","hsnow","tel",],)
If you have any comments, questions or suggestions, please contact the authors:
matthieu.gallet@univ-smb.fr
fatima.karbou@meteo.fr
abdourrahmane.atto@univ-smb.fr
emmanuel.trouve@univ-smb.fr
Forensic DNA Open Dataset
catalog.data.gov
s.cnmilf.com
+2more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Forensic DNA Open Dataset [Dataset]. https://catalog.data.gov/dataset/forensic-dna-open-dataset-a26bc
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset consists of single source and mixture samples which were genotyped/sequenced with kits targeting Forensic DNA markers. More information specific to the kit and or method used can be found in the README text files included in each zipped file.The CE-STR kits reported for the single source samples include: Applied Biosystems GlobalFiler, Applied Biosystems Y-Filer Plus, Promega PowerPlex Fusion 6C, Promega PowerPlex Y23The CE profiles for single source samples are also included in a spreadsheet.The following CE-STR kit is reported for the mixture samples: Promega PowerPlex Fusion 6CThe sequencing kits reported for the mixture and single source samples include: Verogen ForenSeq DNA Signature Prep Kit, Promega PowerSeq 46GY, Thermo Fisher Applied Biosystems Precision ID GlobalFiler NGS STR Panel v2The single source samples only are reported for: Promega PowerSeq CRM Nested SystemThis data was produced with approval from the NIST Research Protections Office. It is intended for research, training, and educational purposes only and could potentially contain errors due to limited review prior to uploading. This data should not be used to identify the donor of the profile or uploaded/searched versus public or law enforcement DNA databases. Certain commercial equipment, instruments, or materials are identified in this dataset in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
Global Biodiversity Information Facility (GBIF) Species Occurrences
registry.opendata.aws
Updated May 17, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Global Biodiversity Information Facility (GBIF) (2021). Global Biodiversity Information Facility (GBIF) Species Occurrences [Dataset]. https://registry.opendata.aws/gbif/
Explore at:
Dataset updated
May 17, 2021
Dataset provided by
Global Biodiversity Information Facilityhttps://www.gbif.org/
Description
The Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species. GBIF currently integrates datasets documenting over 1.6 billion species occurrences, growing daily. The GBIF occurrence dataset combines data from a wide array of sources including specimen-related data from natural history museums, observations from citizen science networks and environment recording schemes. While these data are constantly changing at GBIF.org, periodic snapshots are taken and made available on AWS.
Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota (2021). MusicOSet: An Enhanced Open Dataset for Music Data Mining [Dataset]. http://doi.org/10.5281/zenodo.4904639
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4904639
Dataset updated
Jun 7, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mariana O. Silva; Mariana O. Silva; Laís Mota; Mirella M. Moro; Mirella M. Moro; Laís Mota
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MusicOSet is an open and enhanced dataset of musical elements (artists, songs and albums) based on musical popularity classification. Provides a directly accessible collection of data suitable for numerous tasks in music data mining (e.g., data visualization, classification, clustering, similarity search, MIR, HSS and so forth). To create MusicOSet, the potential information sources were divided into three main categories: music popularity sources, metadata sources, and acoustic and lyrical features sources. Data from all three categories were initially collected between January and May 2019. Nevertheless, the update and enhancement of the data happened in June 2019.

The attractive features of MusicOSet include:

Integration and centralization of different musical data sources

Calculation of popularity scores and classification of hits and non-hits musical elements, varying from 1962 to 2018

Enriched metadata for music, artists, and albums from the US popular music industry

Availability of acoustic and lyrical resources

Unrestricted access in two formats: SQL database and compressed .csv files

| Data | # Records | |:-----------------:|:---------:| | Songs | 20,405 | | Artists | 11,518 | | Albums | 26,522 | | Lyrics | 19,664 | | Acoustic Features | 20,405 | | Genres | 1,561 |
Regulatory information for cosmetics
open.canada.ca
datasets.ai
+1more
html
Updated Mar 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Health Canada (2021). Regulatory information for cosmetics [Dataset]. https://open.canada.ca/data/en/dataset/0945ce45-411e-4ed2-8ccc-e4f7d0840f9f
Explore at:
htmlAvailable download formats
Dataset updated
Mar 1, 2021
Dataset provided by
Health Canadahttp://www.hc-sc.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
All cosmetics sold in Canada must be safe to use and must not pose any health risk. They must meet the requirements of the Food and Drugs Act and the Cosmetic Regulations.
Annual Freedom of Information Act (FOIA) Reports - Dataset - NASA Open Data...
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Annual Freedom of Information Act (FOIA) Reports - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/annual-freedom-of-information-act-foia-reports
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
NASA makes annual reports of progress made on Freedom of Information Act (FOIA) requests. This database contains PDF and XML versions of reports from 1999 to the present.
a
Published Open Data Sets
hub.arcgis.com
data.squamish.ca
+1more
Updated Jul 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
District of Squamish (2022). Published Open Data Sets [Dataset]. https://hub.arcgis.com/maps/squamish::published-open-data-sets
Explore at:
Dataset updated
Jul 20, 2022
Dataset authored and provided by
District of Squamish
Area covered
Description
Published Open Data Sets | Squamish Community DashboardThis measure tracks the number of public open data sets published online for community use as part of the District's Open Data Portal. The Squamish Open Data Portal (data.squamish.ca) provides full GIS data sets including but not limited to physical terrain and imagery, environment, infrastructure, business data, development, recreation, transportation, and emergency management. Open data is commonly defined as data that is free and available for anyone to use and republish as they wish.About this target:Available open data, and corresponding public visitation and usage of that data, progressively increases (Squamish Community Digital Strategy).Analysis:As of year-end 2024, the District's open data portal had 79 published data sets. Since launching the District's Open Data Portal in 2016, the municipality has added 44 data sets to the number initially published (35 data sets). Reason for monitoring:Enhancing access to and utilization of information contributes to open and transparent government, promotes a more connected and engaged community and strengthens decision making and service delivery. These are core goals of the Community Digital Strategy developed and adopted in 2016 to better leverage technology to meet the growing social, economic and environmental needs of citizens, and link digital products and services to wider community and economic development.
d
Open dataset of annual Article Processing Charges (APCs) of gold and hybrid...
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Butler, Leigh-Ann; Hare, Madelaine; Schönfelder, Nina; Schares, Eric; Alperin, Juan Pablo; Haustein, Stefanie (2024). Open dataset of annual Article Processing Charges (APCs) of gold and hybrid journals published by Elsevier, Frontiers, MDPI, PLOS, Springer-Nature and Wiley 2019-2023 [Dataset]. http://doi.org/10.7910/DVN/CR1MMV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/CR1MMV
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Butler, Leigh-Ann; Hare, Madelaine; Schönfelder, Nina; Schares, Eric; Alperin, Juan Pablo; Haustein, Stefanie
Description
This open dataset of annual Article Processing Charges (APCs) was produced from the price lists of six large scholarly publishers (Elsevier, Frontiers, PLOS, MDPI, Springer-Nature and Wiley) from 2019 to 2023. APC price lists were downloaded from publisher websites each year as well as via Wayback Machine snapshots to retrieve fees per journal per year. The dataset includes journal metadata, APC collection method, and annual APC list prices in several currencies (USD, EUR, GBP, CHF, JPY, CAD) for 8,712 unique journals and 36,618 journal-year combinations. The dataset was generated to allow for more precise analysis of APCs and can support library collection development and scientometric analysis estimating APCs paid in gold and hybrid OA journals.
C
City of Milwaukee Open Data Dataset Catalog
data.milwaukee.gov
csv
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Information Technology and Management Division (2025). City of Milwaukee Open Data Dataset Catalog [Dataset]. https://data.milwaukee.gov/dataset/dataset-catalog
Explore at:
csvAvailable download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Information Technology and Management Division
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Milwaukee
Description
This dataset is a catalog of all the datasets available on the City of Milwaukee Open Data portal.
KU-HAR: Human Activity Recognition Dataset (v 1.0)
kaggle.com
Updated Apr 1, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Niloy Sikder (2021). KU-HAR: Human Activity Recognition Dataset (v 1.0) [Dataset]. https://www.kaggle.com/datasets/niloy333/kuhar
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Niloy Sikder
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
KU-HAR: An Open Dataset for Human Activity Recognition (v 1.0)

Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them.

Activities/ Classes

Stand ➞ Standing still (1 min)

Sit ➞ Sitting still (1 min)

Talk-sit ➞ Talking with hand movements while sitting (1 min)

Talk-stand ➞ Talking with hand movements while standing or walking(1 min)

Stand-sit ➞ Repeatedly standing up and sitting down (5 times)

Lay ➞ Laying still (1 min)

Lay-stand ➞ Repeatedly standing up and laying down (5 times)

Pick ➞ Picking up an object from the floor (10 times)

Jump ➞ Jumping repeatedly (10 times)

Push-up ➞ Performing full push-ups (5 times)

Sit-up ➞ Performing sit-ups (5 times)

Walk ➞ Walking 20 meters (≈12 s)

Walk-backward ➞ Walking backward for 20 meters (≈20 s)

Walk-circle ➞ Walking along a circular path (≈ 20 s)

Run ➞ Running 20 meters (≈7 s)

Stair-up ➞ Ascending on a set of stairs (≈1 min)

Stair-down ➞ Descending from a set of stairs (≈50 s)

Table-tennis ➞ Playing table tennis (1 min)

Contents of the .zip files

1.Raw_ time_ domian_ data.zip ➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5 ➞ exact time (elapsed since the start) when the Accelerometer (col. 1) & Gyroscope (col. 5) output were recorded (in ms) Col. 2, 3, 4 ➞ Acceleration along X, Y, Z axes (in m/s^2) Col. 6, 7, 8 ➞ Rate of rotation around X, Y, Z axes (in rad/s)

2.Trimmed_ interpolated_ raw_ data.zip ➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

3.Time_ domain_ subsamples.zip ➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900 ➞ Accelerometer X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800 ➞ Gyro X, Y, Z axes readings Col. 1801 ➞ Class ID (0 to 17, in the order mentioned above) Col. 1802 ➞ length of each channel data in the subsample Col. 1803 ➞ serial no. of the subsample

Gravity acceleration was omitted from the Accelerometer data, and no filter was applied to remove noise. The dataset is free to download, modify, and use provided that the source and the associated article are properly referenced.

Use the .csv file of the Time_ domain_ subsamples.zip for instant HAR classification tasks. See this notebook for details. Use the other files if you want to work with raw activity data.

Citation Request

More information is provided in the following data paper. Please cite it if you use this dataset in your research/work: [1] N. Sikder and A.-A. Nahid, “**KU-HAR: An open dataset for heterogeneous human activity recognition**,” Pattern Recognition Letters, vol. 146, pp. 46–54, Jun. 2021, doi: 10.1016/j.patrec.2021.02.024

[2] N. Sikder, M. A. R. Ahad, and A.-A. Nahid, “Human Action Recognition Based on a Sequential Deep Learning Model,” 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR). IEEE, Aug. 16, 2021. doi: 10.1109/icievicivpr52578.2021.9564234.

Cite the dataset as: A.-A. Nahid, N. Sikder, and I. Rafi, “KU-HAR: An Open Dataset for Human Activity Recognition.” Mendeley, Feb. 16, 2021, doi: 10.17632/45F952Y38R.5

Supplementary files: https://drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7

Conclusion

The dataset is originally hosted on Mendeley Data

The image used in the banner is collected from here and attributed as: Fit, athletic man getting ready for a run by Jacob Lund from Noun Projects
A New Dataset for Streaming Learning Analytics
zenodo.org
csv
Updated Oct 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gianluca Zaza; Gianluca Zaza (2024). A New Dataset for Streaming Learning Analytics [Dataset]. http://doi.org/10.5281/zenodo.14003233
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14003233
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gabriella Casalino; Gabriella Casalino; Giovanna Castellano; Giovanna Castellano; Gianluca Zaza; Gianluca Zaza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 2024
Description
This research introduces a novel dataset developed for streaming learning analytics, derived from the Open University Learning Analytics Dataset (OULAD). The dataset incorporates essential temporal information that captures the timing of student interactions with the Virtual Learning Environment (VLE). By integrating these time-based interactions, the dataset enhances the capabilities of stream algorithms, which are particularly well-suited for real-time monitoring and analysis of student learning behaviors.

The dataset consists of 34 features and 1,718,983 samples, encompassing students' demographic information, assessment scores, and interactions with the VLE for a specific time ( T ), corresponding to each student ( S ) within a given course ( C ) and module ( M ). The target classes—'Withdrawn', 'Fail', 'Pass', and 'Distinction'—were encoded as 0, 1, 2, and 3, respectively. Notably, the data exhibits a significant imbalance, with a substantial prevalence of records associated with students who passed the final examination. The class distribution is as follows: 'Pass' (1,022,760 samples), 'Distinction' (308,642 samples), 'Fail' (227,550$ samples), and 'Withdrawn' (160,031 samples).

For further details on the data, please refer to the manuscript: Gabriella Casalino, Giovanna Castellano, Gianluca Zaza, "Does Time Matter in Analyzing Educational Data? - A New Dataset for Streaming Learning Analytics.", CEUR Proceedings
S
Data and code for "An Open Dataset of Chinese Duration Expressions"
scidb.cn
Updated Aug 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang Si-Qi; Niu Jia-Wen; Liu Xiaoqian; Sui Xiao-Yang; Rao Li-Lin (2025). Data and code for "An Open Dataset of Chinese Duration Expressions" [Dataset]. http://doi.org/10.57760/sciencedb.28888
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.28888
Dataset updated
Aug 7, 2025
Dataset provided by
Science Data Bank
Authors
Zhang Si-Qi; Niu Jia-Wen; Liu Xiaoqian; Sui Xiao-Yang; Rao Li-Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises the data and the code for the manuscript "An Open Dataset of Chinese Duration Expressions".Duration information is essential for understanding and analyzing our world. In textual contexts, duration information is typically conveyed in two formats: numeric (e.g., 1 hour) and verbal (e.g., shortly). To analyze duration information in text, it is crucial to understand how people map duration expressions to corresponding numerical duration. However, the literature has yet to provide lexicons supporting such conversion. Furthermore, existing databases of time-related expressions often lack information about word frequency – a robust predictor of information processing. Here, we report an open dataset of 2,101 Chinese duration expressions, each annotated with its corresponding numerical duration. To obtain high-quality data for word frequency, we obtained the frequency of each duration expression from a large-scale corpus of 10 billion Chinese characters (BLCU Corpus Center (BCC) Corpus) and computed an adjusted frequency for each expression. This dataset provides a valuable resource for research on temporal information in Chinese, facilitating studies in natural language processing, psychology, and linguistics.
e
Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO
energydata.info
Updated Jul 25, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO [Dataset]. https://energydata.info/dataset/global-roads-open-access-data-set-2010
Explore at:
Dataset updated
Jul 25, 2018
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.
r
Data from: OPEN-KTH-3dMODELS: An Open Dataset of Building Models at KTH...
researchdata.se
explore.openaire.eu
Updated Dec 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naveen Mohan; Maxime Sainte Catherine; Lester Jose (2023). OPEN-KTH-3dMODELS: An Open Dataset of Building Models at KTH Campus Valhallavägen [Dataset]. http://doi.org/10.5281/ZENODO.10445868
Explore at:
Unique identifier
https://doi.org/10.5281/ZENODO.10445868
Dataset updated
Dec 31, 2023
Dataset provided by
KTH Royal Institute of Technology
Authors
Naveen Mohan; Maxime Sainte Catherine; Lester Jose
Area covered
Valhallavägen
Description
OPEN-KTH-3dMODELS: An open dataset of building models at KTH Campus Valhallavägen

Open-KTH-3dModels is a subproject of the AD-EYE testbed for Automated Driving and Intelligent Transportation Systems.

The dataset comprises of a series .blend files that have prominent buildings from KTH campus Valhallavägen.

The dataset also contains PreScan compatible models that can be used wtih AD-EYE (https://www.adeye.se/)

Visualisation video: https://www.youtube.com/watch?v=F6NfCiul3oELearn more at https://www.adeye.se/open-kth-3dmodels or contact adeye@md.kth.se

The AD-EYE testbed is based on the design presented in the work "AD-EYE: A Co-Simulation Platform for Early Verification of Functional Safety Concepts"

Original paper: https://doi.org/10.4271/2019-01-0126

Preprint available at: https://arxiv.org/abs/1912.00448

Citation:

Naveen Mohan, Martin Törngren, "AD-EYE: A Co-Simulation Platform for Early Verification of Functional Safety Concepts", SAE Technical Paper 19AE-0203/2019-01-0126, https://doi.org/10.4271/2019-01-0126

Notes:

Modelling work primarily performed by Lester Jose, during his internship with AD-EYE.
m
KU-HAR: An Open Dataset for Human Activity Recognition
data.mendeley.com
Updated Feb 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah-Al Nahid (2021). KU-HAR: An Open Dataset for Human Activity Recognition [Dataset]. http://doi.org/10.17632/45f952y38r.5
Explore at:
Unique identifier
https://doi.org/10.17632/45f952y38r.5
Dataset updated
Feb 16, 2021
Authors
Abdullah-Al Nahid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
(Always use the latest version of the dataset. )

Human Activity Recognition (HAR) refers to the capacity of machines to perceive human actions. This dataset contains information on 18 different activities collected from 90 participants (75 male and 15 female) using smartphone sensors (Accelerometer and Gyroscope). It has 1945 raw activity samples collected directly from the participants, and 20750 subsamples extracted from them. The activities are:

Stand➞ Standing still (1 min) Sit➞ Sitting still (1 min) Talk-sit➞ Talking with hand movements while sitting (1 min) Talk-stand➞ Talking with hand movements while standing or walking(1 min) Stand-sit➞ Repeatedly standing up and sitting down (5 times) Lay➞ Laying still (1 min) Lay-stand➞ Repeatedly standing up and laying down (5 times) Pick➞ Picking up an object from the floor (10 times) Jump➞ Jumping repeatedly (10 times) Push-up➞ Performing full push-ups (5 times) Sit-up➞ Performing sit-ups (5 times) Walk➞ Walking 20 meters (≈12 s) Walk-backward➞ Walking backward for 20 meters (≈20 s) Walk-circle➞ Walking along a circular path (≈ 20 s) Run➞ Running 20 meters (≈7 s) Stair-up➞ Ascending on a set of stairs (≈1 min) Stair-down➞ Descending from a set of stairs (≈50 s) Table-tennis➞ Playing table tennis (1 min)

Contents of the attached .zip files are: 1.Raw_time_domian_data.zip➞ Originally collected 1945 time-domain samples in separate .csv files. The arrangement of information in each .csv file is: Column 1, 5➞ exact time (elapsed since the start) when the Accelerometer & Gyro output was recorded (in ms) Col. 2, 3, 4➞ Acceleration along X,Y,Z axes (in m/s^2) Col. 6, 7, 8➞ Rate of rotation around X,Y,Z axes (in rad/s)

2.Trimmed_interpolated_raw_data.zip➞ Unnecessary parts of the samples were trimmed (only from the beginning and the end). The samples were interpolated to keep a constant sampling rate of 100 Hz. The arrangement of information is the same as above.

3.Time_domain_subsamples.zip➞ 20750 subsamples extracted from the 1945 collected samples provided in a single .csv file. Each of them contains 3 seconds of non-overlapping data of the corresponding activity. Arrangement of information: Col. 1–300, 301–600, 601–900➞ Acc.meter X, Y, Z axes readings Col. 901–1200, 1201–1500, 1501–1800➞ Gyro X, Y, Z axes readings Col. 1801➞ Class ID (0 to 17, in the order mentioned above) Col. 1802➞ length of the each channel data in the subsample Col. 1803➞ serial no. of the subsample

Gravity acceleration was omitted from the Acc.meter data, and no filter was applied to remove noise. The dataset is free to download, modify, and use.

More information is provided in the data paper which is currently under review: N. Sikder, A.-A. Nahid, KU-HAR: An open dataset for heterogeneous human activity recognition, Pattern Recognit. Lett. (submitted).

A preprint will be available soon.

Backup: drive.google.com/drive/folders/1yrG8pwq3XMlyEGYMnM-8xnrd6js0oXA7
o
Freedom of Information data and statistics - Datasets - Government of Jersey...
opendata.gov.je
Updated Feb 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Freedom of Information data and statistics - Datasets - Government of Jersey Open Data [Dataset]. https://opendata.gov.je/dataset/freedom-of-information-data-and-statistics
Explore at:
Dataset updated
Feb 12, 2020
License
Description
This dataset includes the total valid Freedom of Information (FOI) requests received, the volume of Departments' FOI requests and responses, who's making the most FOI requests and common topics. The resources within this dataset are updated monthly. Read more at https://www.gov.je/Government/FreedomOfInformation/Pages/FOIStatistics.aspx
d
AI TOOLS - Open Dataset - 4000 tools / 50 categories
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BUREAU, Olivier (2023). AI TOOLS - Open Dataset - 4000 tools / 50 categories [Dataset]. http://doi.org/10.7910/DVN/QLSXZG
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/QLSXZG
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
BUREAU, Olivier
Description
Introducing a comprehensive and openly accessible dataset designed for researchers and data scientists in the field of artificial intelligence. This dataset encompasses a collection of over 4,000 AI tools, meticulously categorized into more than 50 distinct categories. This valuable resource has been generously shared by its owner, TasticAI, and is freely available for various purposes such as research, benchmarking, market surveys, and more. Dataset Overview: The dataset provides an extensive repository of AI tools, each accompanied by a wealth of information to facilitate your research endeavors. Here is a brief overview of the key components: AI Tool Name: Each AI tool is listed with its name, providing an easy reference point for users to identify specific tools within the dataset. Description: A concise one-line description is provided for each AI tool. This description offers a quick glimpse into the tool's purpose and functionality. AI Tool Category: The dataset is thoughtfully organized into more than 50 distinct categories, ensuring that you can easily locate AI tools that align with your research interests or project needs. Whether you are working on natural language processing, computer vision, machine learning, or other AI subfields, you will find a dedicated category. Images: Visual representation is crucial for understanding and identifying AI tools. To aid your exploration, the dataset includes images associated with each tool, allowing for quick recognition and visual association. Website Links: Accessing more detailed information about a specific AI tool is effortless, as direct links to the tool's respective website or documentation are provided. This feature enables researchers and data scientists to delve deeper into the tools that pique their interest. Utilization and Benefits: This openly shared dataset serves as a valuable resource for various purposes: Research: Researchers can use this dataset to identify AI tools relevant to their studies, facilitating faster literature reviews, comparative analyses, and the exploration of cutting-edge technologies. Benchmarking: The extensive collection of AI tools allows for comprehensive benchmarking, enabling you to evaluate and compare tools within specific categories or across categories. Market Surveys: Data scientists and market analysts can utilize this dataset to gain insights into the AI tool landscape, helping them identify emerging trends and opportunities within the AI market. Educational Purposes: Educators and students can leverage this dataset for teaching and learning about AI tools, their applications, and the categorization of AI technologies. Conclusion: In summary, this openly shared dataset from TasticAI, featuring over 4,000 AI tools categorized into more than 50 categories, represents a valuable asset for researchers, data scientists, and anyone interested in the field of artificial intelligence. Its easy accessibility, detailed information, and versatile applications make it an indispensable resource for advancing AI research, benchmarking, market analysis, and more. Explore the dataset at https://tasticai.com and unlock the potential of this rich collection of AI tools for your projects and studies.
o
Information and Computer Skilled Level - Dataset - Open Government Data
opendata.gov.jo
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Information and Computer Skilled Level - Dataset - Open Government Data [Dataset]. https://opendata.gov.jo/dataset/information-and-computer-skilled-level-3595-2022
Explore at:
Dataset updated
Dec 27, 2024
Description
Information and Computer Skilled Level [2022] Training Programs Related to Information and Computer within the Skilled Level

Facebook

Twitter

Click to copy link

Link copied

Cite

Treasury Board of Canada Secretariat (2025). Open Data Portal Catalogue [Dataset]. https://open.canada.ca/data/en/dataset/c4c5c7f1-bfa6-4ff6-b4a0-c164cb2060f7

Open Data Portal Catalogue

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

csv, sqlite, json, png, jsonl, xlsxAvailable download formats

Dataset updated

Aug 27, 2025

Dataset provided by

Treasury Board of Canada Secretariathttp://www.tbs-sct.gc.ca/
Treasury Board of Canadahttps://www.canada.ca/en/treasury-board-secretariat/corporate/about-treasury-board.html

License

Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically

Description

The open data portal catalogue is a downloadable dataset containing some key metadata for the general datasets available on the Government of Canada's Open Data portal. Resource 1 is generated using the ckanapi tool (external link) Resources 2 - 8 are generated using the Flatterer (external link) utility. ###Description of resources: 1. Dataset is a JSON Lines (external link) file where the metadata of each Dataset/Open Information Record is one line of JSON. The file is compressed with GZip. The file is heavily nested and recommended for users familiar with working with nested JSON. 2. Catalogue is a XLSX workbook where the nested metadata of each Dataset/Open Information Record is flattened into worksheets for each type of metadata. 3. datasets metadata contains metadata at the dataset level. This is also referred to as the package in some CKAN documentation. This is the main table/worksheet in the SQLite database and XLSX output. 4. Resources Metadata contains the metadata for the resources contained within each dataset. 5. resource views metadata contains the metadata for the views applied to each resource, if a resource has a view configured. 6. datastore fields metadata contains the DataStore information for CSV datasets that have been loaded into the DataStore. This information is displayed in the Data Dictionary for DataStore enabled CSVs. 7. Data Package Fields contains a description of the fields available in each of the tables within the Catalogue, as well as the count of the number of records each table contains. 8. data package entity relation diagram Displays the title and format for column, in each table in the Data Package in the form of a ERD Diagram. The Data Package resource offers a text based version. 9. SQLite Database is a .db database, similar in structure to Catalogue. This can be queried with database or analytical software tools for doing analysis.

Clear search

Close search

Google apps

Main menu

Open Data Portal Catalogue

Data from: MUHSIC: An Open Dataset with Temporal Musical Success Information...

LSD4WSD : An Open Dataset for Wet Snow Detection with SAR Data and Physical...

Forensic DNA Open Dataset

Global Biodiversity Information Facility (GBIF) Species Occurrences

Data from: MusicOSet: An Enhanced Open Dataset for Music Data Mining

Regulatory information for cosmetics

Annual Freedom of Information Act (FOIA) Reports - Dataset - NASA Open Data...

Published Open Data Sets

Open dataset of annual Article Processing Charges (APCs) of gold and hybrid...

City of Milwaukee Open Data Dataset Catalog

KU-HAR: Human Activity Recognition Dataset (v 1.0)

KU-HAR: An Open Dataset for Human Activity Recognition (v 1.0)

Activities/ Classes

Contents of the .zip files

Citation Request

Conclusion

A New Dataset for Streaming Learning Analytics

Data and code for "An Open Dataset of Chinese Duration Expressions"

Global - Roads Open Access Data Set - Dataset - ENERGYDATA.INFO

Data from: OPEN-KTH-3dMODELS: An Open Dataset of Building Models at KTH...

KU-HAR: An Open Dataset for Human Activity Recognition

Freedom of Information data and statistics - Datasets - Government of Jersey...

AI TOOLS - Open Dataset - 4000 tools / 50 categories

Information and Computer Skilled Level - Dataset - Open Government Data

Open Data Portal Catalogue