Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.
Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-json Image dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image-zip.
mmwmm/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
geodata data package providing geojson polygons for all the world's countries
I created a dataset to help people create choropleth maps of United States states.
One geojson to plot the countries borders, and one csv from the Census Bureau for the us population per state.
I think the best way to use this dataset is in joining it with other data. For example, I used this dataset to plot police killings using the data from https://www.kaggle.com/jpmiller/police-violence-in-the-us
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
3DHD CityScenes is the most comprehensive, large-scale high-definition (HD) map dataset to date, annotated in the three spatial dimensions of globally referenced, high-density LiDAR point clouds collected in urban domains. Our HD map covers 127 km of road sections of the inner city of Hamburg, Germany including 467 km of individual lanes. In total, our map comprises 266,762 individual items.
Our corresponding paper (published at ITSC 2022) is available here. Further, we have applied 3DHD CityScenes to map deviation detection here.
Moreover, we release code to facilitate the application of our dataset and the reproducibility of our research. Specifically, our 3DHD_DevKit comprises:
Python tools to read, generate, and visualize the dataset,
3DHDNet deep learning pipeline (training, inference, evaluation) for map deviation detection and 3D object detection.
The DevKit is available here:
https://github.com/volkswagen/3DHD_devkit.
The dataset and DevKit have been created by Christopher Plachetka as project lead during his PhD period at Volkswagen Group, Germany.
When using our dataset, you are welcome to cite:
@INPROCEEDINGS{9921866, author={Plachetka, Christopher and Sertolli, Benjamin and Fricke, Jenny and Klingner, Marvin and Fingscheidt, Tim}, booktitle={2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC)}, title={3DHD CityScenes: High-Definition Maps in High-Density Point Clouds}, year={2022}, pages={627-634}}
Acknowledgements
We thank the following interns for their exceptional contributions to our work.
Benjamin Sertolli: Major contributions to our DevKit during his master thesis
Niels Maier: Measurement campaign for data collection and data preparation
The European large-scale project Hi-Drive (www.Hi-Drive.eu) supports the publication of 3DHD CityScenes and encourages the general publication of information and databases facilitating the development of automated driving technologies.
The Dataset
After downloading, the 3DHD_CityScenes folder provides five subdirectories, which are explained briefly in the following.
This directory contains the training, validation, and test set definition (train.json, val.json, test.json) used in our publications. Respective files contain samples that define a geolocation and the orientation of the ego vehicle in global coordinates on the map.
During dataset generation (done by our DevKit), samples are used to take crops from the larger point cloud. Also, map elements in reach of a sample are collected. Both modalities can then be used, e.g., as input to a neural network such as our 3DHDNet.
To read any JSON-encoded data provided by 3DHD CityScenes in Python, you can use the following code snipped as an example.
import json
json_path = r"E:\3DHD_CityScenes\Dataset\train.json" with open(json_path) as jf: data = json.load(jf) print(data)
Map items are stored as lists of items in JSON format. In particular, we provide:
traffic signs,
traffic lights,
pole-like objects,
construction site locations,
construction site obstacles (point-like such as cones, and line-like such as fences),
line-shaped markings (solid, dashed, etc.),
polygon-shaped markings (arrows, stop lines, symbols, etc.),
lanes (ordinary and temporary),
relations between elements (only for construction sites, e.g., sign to lane association).
Our high-density point cloud used as basis for annotating the HD map is split in 648 tiles. This directory contains the geolocation for each tile as polygon on the map. You can view the respective tile definition using QGIS. Alternatively, we also provide respective polygons as lists of UTM coordinates in JSON.
Files with the ending .dbf, .prj, .qpj, .shp, and .shx belong to the tile definition as “shape file” (commonly used in geodesy) that can be viewed using QGIS. The JSON file contains the same information provided in a different format used in our Python API.
The high-density point cloud tiles are provided in global UTM32N coordinates and are encoded in a proprietary binary format. The first 4 bytes (integer) encode the number of points contained in that file. Subsequently, all point cloud values are provided as arrays. First all x-values, then all y-values, and so on. Specifically, the arrays are encoded as follows.
x-coordinates: 4 byte integer
y-coordinates: 4 byte integer
z-coordinates: 4 byte integer
intensity of reflected beams: 2 byte unsigned integer
ground classification flag: 1 byte unsigned integer
After reading, respective values have to be unnormalized. As an example, you can use the following code snipped to read the point cloud data. For visualization, you can use the pptk package, for instance.
import numpy as np import pptk
file_path = r"E:\3DHD_CityScenes\HD_PointCloud_Tiles\HH_001.bin" pc_dict = {} key_list = ['x', 'y', 'z', 'intensity', 'is_ground'] type_list = ['
World Country and State coordinate for plotting geospatial maps.
Files source:
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Info-communications Media Development Authority. For more information, visit https://data.gov.sg/datasets/d_d8644084f8b54f851a1acbb2f04d5089/view
This dataset contains resources transformed from other datasets on HDX. They exist here only in a format modified to support visualization on HDX and may not be as up to date as the source datasets from which they are derived.
Source datasets: https://data.hdx.rwlabs.org/dataset/idps-data-by-region-in-mali
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Noura Aly
Released under Apache 2.0
CognitiveScience/example-space-to-dataset-json dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
The 802.11 standard includes several management features and corresponding frame types. One of them are Probe Requests (PR), which are sent by mobile devices in an unassociated state to scan the nearby area for existing wireless networks. The frame part of PRs consists of variable-length fields, called Information Elements (IE), which represent the capabilities of a mobile device, such as supported data rates.
This dataset contains PRs collected over a seven-day period by four gateway devices in an uncontrolled urban environment in the city of Catania.
It can be used for various use cases, e.g., analyzing MAC randomization, determining the number of people in a given location at a given time or in different time periods, analyzing trends in population movement (streets, shopping malls, etc.) in different time periods, etc.
Related dataset
Same authors also produced the Labeled dataset of IEEE 802.11 probe requests with same data layout and recording equipment.
Measurement setup
The system for collecting PRs consists of a Raspberry Pi 4 (RPi) with an additional WiFi dongle to capture WiFi signal traffic in monitoring mode (gateway device). Passive PR monitoring is performed by listening to 802.11 traffic and filtering out PR packets on a single WiFi channel.
The following information about each received PR is collected: - MAC address - Supported data rates - extended supported rates - HT capabilities - extended capabilities - data under extended tag and vendor specific tag - interworking - VHT capabilities - RSSI - SSID - timestamp when PR was received.
The collected data was forwarded to a remote database via a secure VPN connection. A Python script was written using the Pyshark package to collect, preprocess, and transmit the data.
Data preprocessing
The gateway collects PRs for each successive predefined scan interval (10 seconds). During this interval, the data is preprocessed before being transmitted to the database. For each detected PR in the scan interval, the IEs fields are saved in the following JSON structure:
PR_IE_data = { 'DATA_RTS': {'SUPP': DATA_supp , 'EXT': DATA_ext}, 'HT_CAP': DATA_htcap, 'EXT_CAP': {'length': DATA_len, 'data': DATA_extcap}, 'VHT_CAP': DATA_vhtcap, 'INTERWORKING': DATA_inter, 'EXT_TAG': {'ID_1': DATA_1_ext, 'ID_2': DATA_2_ext ...}, 'VENDOR_SPEC': {VENDOR_1:{ 'ID_1': DATA_1_vendor1, 'ID_2': DATA_2_vendor1 ...}, VENDOR_2:{ 'ID_1': DATA_1_vendor2, 'ID_2': DATA_2_vendor2 ...} ...} }
Supported data rates and extended supported rates are represented as arrays of values that encode information about the rates supported by a mobile device. The rest of the IEs data is represented in hexadecimal format. Vendor Specific Tag is structured differently than the other IEs. This field can contain multiple vendor IDs with multiple data IDs with corresponding data. Similarly, the extended tag can contain multiple data IDs with corresponding data.
Missing IE fields in the captured PR are not included in PR_IE_DATA.
When a new MAC address is detected in the current scan time interval, the data from PR is stored in the following structure:
{'MAC': MAC_address, 'SSIDs': [ SSID ], 'PROBE_REQs': [PR_data] },
where PR_data is structured as follows:
{ 'TIME': [ DATA_time ], 'RSSI': [ DATA_rssi ], 'DATA': PR_IE_data }.
This data structure allows to store only 'TOA' and 'RSSI' for all PRs originating from the same MAC address and containing the same 'PR_IE_data'. All SSIDs from the same MAC address are also stored. The data of the newly detected PR is compared with the already stored data of the same MAC in the current scan time interval. If identical PR's IE data from the same MAC address is already stored, only data for the keys 'TIME' and 'RSSI' are appended. If identical PR's IE data from the same MAC address has not yet been received, then the PR_data structure of the new PR for that MAC address is appended to the 'PROBE_REQs' key. The preprocessing procedure is shown in Figure ./Figures/Preprocessing_procedure.png
At the end of each scan time interval, all processed data is sent to the database along with additional metadata about the collected data, such as the serial number of the wireless gateway and the timestamps for the start and end of the scan. For an example of a single PR capture, see the Single_PR_capture_example.json file.
Folder structure
For ease of processing of the data, the dataset is divided into 7 folders, each containing a 24-hour period. Each folder contains four files, each containing samples from that device.
The folders are named after the start and end time (in UTC). For example, the folder 2022-09-22T22-00-00_2022-09-23T22-00-00 contains samples collected between 23th of September 2022 00:00 local time, until 24th of September 2022 00:00 local time.
Files representing their location via mapping: - 1.json -> location 1 - 2.json -> location 2 - 3.json -> location 3 - 4.json -> location 4
Environments description
The measurements were carried out in the city of Catania, in Piazza Università and Piazza del Duomo The gateway devices (rPIs with WiFi dongle) were set up and gathering data before the start time of this dataset. As of September 23, 2022, the devices were placed in their final configuration and personally checked for correctness of installation and data status of the entire data collection system. Devices were connected either to a nearby Ethernet outlet or via WiFi to the access point provided.
Four Raspbery Pi-s were used: - location 1 -> Piazza del Duomo - Chierici building (balcony near Fontana dell’Amenano) - location 2 -> southernmost window in the building of Via Etnea near Piazza del Duomo - location 3 -> nothernmost window in the building of Via Etnea near Piazza Università - location 4 -> first window top the right of the entrance of the University of Catania
Locations were suggested by the authors and adjusted during deployment based on physical constraints (locations of electrical outlets or internet access) Under ideal circumstances, the locations of the devices and their coverage area would cover both squares and the part of Via Etna between them, with a partial overlap of signal detection. The locations of the gateways are shown in Figure ./Figures/catania.png.
Known dataset shortcomings
Due to technical and physical limitations, the dataset contains some identified deficiencies.
PRs are collected and transmitted in 10-second chunks. Due to the limited capabilites of the recording devices, some time (in the range of seconds) may not be accounted for between chunks if the transmission of the previous packet took too long or an unexpected error occurred.
Every 20 minutes the service is restarted on the recording device. This is a workaround for undefined behavior of the USB WiFi dongle, which can no longer respond. For this reason, up to 20 seconds of data will not be recorded in each 20-minute period.
The devices had a scheduled reboot at 4:00 each day which is shown as missing data of up to a few minutes.
Location 1 - Piazza del Duomo - Chierici
The gateway device (rPi) is located on the second floor balcony and is hardwired to the Ethernet port. This device appears to function stably throughout the data collection period. Its location is constant and is not disturbed, dataset seems to have complete coverage.
Location 2 - Via Etnea - Piazza del Duomo
The device is located inside the building. During working hours (approximately 9:00-17:00), the device was placed on the windowsill. However, the movement of the device cannot be confirmed. As the device was moved back and forth, power outages and internet connection issues occurred. The last three days in the record contain no PRs from this location.
Location 3 - Via Etnea - Piazza Università
Similar to Location 2, the device is placed on the windowsill and moved around by people working in the building. Similar behavior is also observed, e.g., it is placed on the windowsill and moved inside a thick wall when no people are present. This device appears to have been collecting data throughout the whole dataset period.
Location 4 - Piazza Università
This location is wirelessly connected to the access point. The device was placed statically on a windowsill overlooking the square. Due to physical limitations, the device had lost power several times during the deployment. The internet connection was also interrupted sporadically.
Recognitions
The data was collected within the scope of Resiloc project with the help of City of Catania and project partners.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Urban Redevelopment Authority. For more information, visit https://data.gov.sg/datasets/d_d959102fa76d58f2de276bfbb7e8f68e/view
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the Spider-Realistic dataset used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.
It contains the following files:
- spider-realistic.json
# The spider-realistic evaluation set
# Examples: 508
# Databases: 19
- dev.json
# The original dev split of Spider
# Examples: 1034
# Databases: 20
- tables.json
# The original DB schemas from Spider
# Databases: 166
- README.txt
- license
The Spider-Realistic dataset is created based on the dev split of the Spider dataset realsed by Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task." It is a subset of the original dataset with explicit mention of the column names removed. The sql queries and databases are kept unchanged.
For the format of each json file, please refer to the github page of Spider https://github.com/taoyds/spider.
For the database files please refer to the official Spider release https://yale-lily.github.io/spider.
This dataset is distributed under the CC BY-SA 4.0 license.
If you use the dataset, please cite the following papers including the original Spider datasets, Finegan-Dollak et al., 2018 and the original datasets for Restaurants, GeoQuery, Scholar, Academic, IMDB, and Yelp.
@article{deng2020structure,
title={Structure-Grounded Pretraining for Text-to-SQL},
author={Deng, Xiang and Awadallah, Ahmed Hassan and Meek, Christopher and Polozov, Oleksandr and Sun, Huan and Richardson, Matthew},
journal={arXiv preprint arXiv:2010.12773},
year={2020}
}
@inproceedings{Yu&al.18c,
year = 2018,
title = {Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task},
booktitle = {EMNLP},
author = {Tao Yu and Rui Zhang and Kai Yang and Michihiro Yasunaga and Dongxu Wang and Zifan Li and James Ma and Irene Li and Qingning Yao and Shanelle Roman and Zilin Zhang and Dragomir Radev }
}
@InProceedings{P18-1033,
author = "Finegan-Dollak, Catherine
and Kummerfeld, Jonathan K.
and Zhang, Li
and Ramanathan, Karthik
and Sadasivam, Sesh
and Zhang, Rui
and Radev, Dragomir",
title = "Improving Text-to-SQL Evaluation Methodology",
booktitle = "Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2018",
publisher = "Association for Computational Linguistics",
pages = "351--360",
location = "Melbourne, Australia",
url = "http://aclweb.org/anthology/P18-1033"
}
@InProceedings{data-sql-imdb-yelp,
dataset = {IMDB and Yelp},
author = {Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig},
title = {SQLizer: Query Synthesis from Natural Language},
booktitle = {International Conference on Object-Oriented Programming, Systems, Languages, and Applications, ACM},
month = {October},
year = {2017},
pages = {63:1--63:26},
url = {http://doi.org/10.1145/3133887},
}
@article{data-academic,
dataset = {Academic},
author = {Fei Li and H. V. Jagadish},
title = {Constructing an Interactive Natural Language Interface for Relational Databases},
journal = {Proceedings of the VLDB Endowment},
volume = {8},
number = {1},
month = {September},
year = {2014},
pages = {73--84},
url = {http://dx.doi.org/10.14778/2735461.2735468},
}
@InProceedings{data-atis-geography-scholar,
dataset = {Scholar, and Updated ATIS and Geography},
author = {Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer},
title = {Learning a Neural Semantic Parser from User Feedback},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
year = {2017},
pages = {963--973},
location = {Vancouver, Canada},
url = {http://www.aclweb.org/anthology/P17-1089},
}
@inproceedings{data-geography-original
dataset = {Geography, original},
author = {John M. Zelle and Raymond J. Mooney},
title = {Learning to Parse Database Queries Using Inductive Logic Programming},
booktitle = {Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2},
year = {1996},
pages = {1050--1055},
location = {Portland, Oregon},
url = {http://dl.acm.org/citation.cfm?id=1864519.1864543},
}
@inproceedings{data-restaurants-logic,
author = {Lappoon R. Tang and Raymond J. Mooney},
title = {Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing},
booktitle = {2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora},
year = {2000},
pages = {133--141},
location = {Hong Kong, China},
url = {http://www.aclweb.org/anthology/W00-1317},
}
@inproceedings{data-restaurants-original,
author = {Ana-Maria Popescu, Oren Etzioni, and Henry Kautz},
title = {Towards a Theory of Natural Language Interfaces to Databases},
booktitle = {Proceedings of the 8th International Conference on Intelligent User Interfaces},
year = {2003},
location = {Miami, Florida, USA},
pages = {149--157},
url = {http://doi.acm.org/10.1145/604045.604070},
}
@inproceedings{data-restaurants,
author = {Alessandra Giordani and Alessandro Moschitti},
title = {Automatic Generation and Reranking of SQL-derived Answers to NL Questions},
booktitle = {Proceedings of the Second International Conference on Trustworthy Eternal Systems via Evolving Software, Data and Knowledge},
year = {2012},
location = {Montpellier, France},
pages = {59--76},
url = {https://doi.org/10.1007/978-3-642-45260-4_5},
}
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
User memories from Cultural Heritage Search is a data set (in the form of a file dump) that consists of the audience’s own registrations in the online solution Kulturminnesøk. Cultural Heritage Search is an intermediary service from the Directorate for Cultural Heritage, and is run by an editorial board. The information about most of the cultural monuments in Cultural Heritage Search comes from the cultural heritage base Askeladden, which is managed by the Directorate of Cultural Heritage, but you as a user can also contribute your user memories into the solution. The dataset follows the GeoJSON-LD standard. It can be read and used as regular GeoJSON, but also has a semantic component that allows it to be processed as JSON-LD. For more information, see http://geojson.org/geojson-ld/ . Each document can be linked with Cultural Heritage Search. For example, the document with ID http://kulturminnesok.no/fm/gilahytta-1 can be retrieved from Cultural Heritage Search as follows: https://kulturminnesok.no/minne/?queryString=http://kulturminnesok.no/fm/gilahytta-1
This feature layer consists of paired GLOBE Observer Mosquito Habitat
Mapper (MHM) and GLOBE Observer Land Cover (LC) observation data
resulting from the following processing steps:MHM
GEOJSON Data was pulled from this GLOBE API URL:
https://api.globe.gov/search/v1/measurement/protocol/measureddate/?protocols=mosquito_habitat_mapper&startdate=2017-05-01&enddate=2022-12-31&geojson=TRUE&sample=FALSE Only device-reported measurements are kept- "DataSource" = "GLOBE Observer App"
As we are only interested in device measurements, latitude and
longitude are determined from "MeasurementLatitude" and
"MeasurementLongitude". All instances of duplicate photos have been removed from the dataset.LC
GEOJSON Data was pulled from this GLOBE API
URL:https://api.globe.gov/search/v1/measurement/protocol/measureddate/?protocols=land_covers&startdate=2018-09-01&enddate=2022-12-31&geojson=TRUE&sample=FALSE Only device-reported measurements are kept- "DataSource" = "GLOBE Observer App"
As we are only interested in device measurements, latitude and
longitude are determined from "MeasurementLatitude" and
"MeasurementLongitude".ConcurrenceThese two layers were then combined using a spatiotemporal join with the following conditions: Tool: Geoanalytics Desktop Tools -> Join Features Target Layer: LC Join Type: one to many Join Layer: MHM Coordinate fields used: MeasurementLatitude, MeasurementLongitude Time fields used: MeasuredAt (UTC time) Spatial Proximity: 100 meters (NEAR_GEODESIC) Temporal Proximity: 60 minutes (NEAR) Attribute match: UserIDThe
result is a dataset consisting of all paired instances where the same
observer (Userid) collected a Mosquito Habitat Mapper observation within
100 meters and 1 hour of collecting a Land Cover observation.Additional fields include:lc_mhm_obsID_pair': A string representing the two paired observations- {lc_LandCoverId}_{mhm_MosquitoHabitatMapperId}'lc_latlon':
A string representing the coordinates of the LC observation -
"({lc_MeasurementLatitude}, {lc_MeasurementLongitude})"'mhm_latlon':
A string representing the coordinates of the MHM observation -
"({mhm_MeasurementLatitude}, {mhm_MeasurementLongitude})"'spatialDistanceMeters': Numeric value representing the distance between the two paired observations in meters'temporalDistanceMinutes': Numeric value representing the time delta between the two paired observations in minutes'squareBuffer':
A polygon string representing a 100m square centered on the LC
observation coordinates. This may be used in conjunction with additional
map layers to evaluate the land cover types near the observation
coordinates. (n.b. This is not the buffer used in calculating spatiotemporal concurrence)For the purposes of this visualization, geometry is a 100m x 100m square centered on the Land Cover observation coordinates.
This specialized location dataset delivers detailed information about marina establishments. Maritime industry professionals, coastal planners, and tourism researchers can leverage precise location insights to understand maritime infrastructure, analyze recreational boating landscapes, and develop targeted strategies.
How Do We Create Polygons? -All our polygons are manually crafted using advanced GIS tools like QGIS, ArcGIS, and similar applications. This involves leveraging aerial imagery and street-level views to ensure precision. -Beyond visual data, our expert GIS data engineers integrate venue layout/elevation plans sourced from official company websites to construct detailed indoor polygons. This meticulous process ensures higher accuracy and consistency. -We verify our polygons through multiple quality checks, focusing on accuracy, relevance, and completeness.
What's More? -Custom Polygon Creation: Our team can build polygons for any location or category based on your specific requirements. Whether it’s a new retail chain, transportation hub, or niche point of interest, we’ve got you covered. -Enhanced Customization: In addition to polygons, we capture critical details such as entry and exit points, parking areas, and adjacent pathways, adding greater context to your geospatial data. -Flexible Data Delivery Formats: We provide datasets in industry-standard formats like WKT, GeoJSON, Shapefile, and GDB, making them compatible with various systems and tools. -Regular Data Updates: Stay ahead with our customizable refresh schedules, ensuring your polygon data is always up-to-date for evolving business needs.
Unlock the Power of POI and Geospatial Data With our robust polygon datasets and point-of-interest data, you can: -Perform detailed market analyses to identify growth opportunities. -Pinpoint the ideal location for your next store or business expansion. -Decode consumer behavior patterns using geospatial insights. -Execute targeted, location-driven marketing campaigns for better ROI. -Gain an edge over competitors by leveraging geofencing and spatial intelligence.
Why Choose LocationsXYZ? LocationsXYZ is trusted by leading brands to unlock actionable business insights with our spatial data solutions. Join our growing network of successful clients who have scaled their operations with precise polygon and POI data. Request your free sample today and explore how we can help accelerate your business growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for Figure Atlas.2 from Atlas of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).
Figure Atlas.2 shows WGI reference regions used in the (a) AR5 and (b) AR6 reports.
How to cite this dataset
When citing this dataset, please include both the data citation below (under 'Citable as') and the following citations: For the report component from which the figure originates: Gutiérrez, J.M., R.G. Jones, G.T. Narisma, L.M. Alves, M. Amjad, I.V. Gorodetskaya, M. Grose, N.A.B. Klutse, S. Krakovska, J. Li, D. Martínez-Castro, L.O. Mearns, S.H. Mernild, T. Ngo-Duc, B. van den Hurk, and J.-H. Yoon, 2021: Atlas. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 1927–2058, doi:10.1017/9781009157896.021
Iturbide, M. et al., 2021: Repository supporting the implementation of FAIR principles in the IPCC-WG1 Interactive Atlas. Zenodo. Retrieved from: http://doi.org/10.5281/zenodo.5171760
Figure subpanels
The figure has two panels, with data provided for both panels in the master GitHub repository linked in the documentation.
Data provided in relation to figure
This dataset contains the corner coordinates defining each reference region for the second panel of the figure, which contain coordinate information at a 0.44º resolution. The repository directory 'reference-regions' contains data provided for the reference regions as polygons in different formats (CSV with coordinates, R data, shapefile and geojson) together with R and Python notebooks illustrating the use of these regions with worked examples.
Data for reference regions for AR5 can be found here: https://catalogue.ceda.ac.uk/uuid/a3b6d7f93e5c4ea986f3622eeee2b96f
CMIP5 is the fifth phase of the Coupled Model Intercomparison Project. CMIP6 is the sixth phase of the Coupled Model Intercomparison Project. CORDEX is The Coordinated Regional Downscaling Experiment from the WCRP. AR5 and AR6 refer to the 5th and 6th Annual Report of the IPCC. WGI stands for Working Group I
Notes on reproducing the figure from the provided data
Data and figures produced by the Jupyter Notebooks live inside the notebooks directory. The notebooks describe step by step the basic process followed to generate some key figures of the AR6 WGI Atlas and some products underpinning the Interactive Atlas, such as reference regions, global warming levels, aggregated datasets. They include comments and hints to extend the analysis, thus promoting reusability of the results. These notebooks are provided as guidance for practitioners, more user friendly than the code provided as scripts in the reproducibility folder.
Some of the notebooks require access to large data volumes out of this repository. To speed up the execution of the notebook, in addition to the full code to access the data, we provide a data loading shortcut, by storing intermediate results in the auxiliary-material folder in this repository. To test other parameter settings, the full data access instructions should be followed, which can take long waiting times.
Sources of additional information
The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the figure on the IPCC AR6 website - Link to the report component containing the figure (Atlas) - Link to the Supplementary Material for Atlas, which contains details on the input data used in Table Atlas.SM.15. - Link to the code for the figure, archived on Zenodo. - Link to the necessary notebooks for reproducing the figure from GitHub. - Link to IPCC AR5 reference regions dataset
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.
This dataset represents a processed version of the Public Data File, where the data have been extracted and loaded into a Redivis dataset.
The DataCite Public Data File contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023.
Records have descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema and include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs).
Use of the DataCite Public Data File is subject to the DataCite Data File Use Policy.
This datasets is a processed version of the DataCite public data file, where the original file (a 23GB .tar.gz) has been extracted into 55,239 JSONL files, that were then concatenated into a single JSONL file.
This JSONL file has been imported into a Redivis table to facilitate further exploration and analysis.
A sample project demonstrating how to query the DataCite data file can be found here: https://redivis.com/projects/hx1e-a6w8vmwsx
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Dataset of Bot and Human Activities in GitHub
This repository provides an updated version of a dataset of GitHub contributor activities that is accompanied by a paper published at MSR 2023 in the Data and Tool Showcase Track. The paper is entitled A Dataset of Bot and Human Activities in GitHub and is co-authored by Natarajan Chidambaram, Alexandre Decan and Tom Mens (Software Engineering Lab, University of Mons, Belgium). DOI: https://www.doi.org/10.1109/MSR59073.2023.00070. This work is done as a part of Natarajan Chdiambaram's PhD research in the context of DigitalWallonia4.AI research project ARIAC (grant number 2010235) and TRAIL.
The dataset contains 1,015,422 high-level activities made by 350 bots and 620 human contributors on GitHub between 25 November 2022 and 15 April 2023. The activities were generated from 1,221,907 low-level events obtained from the GitHub's Event API and cover 24 distinct activity types. This dataset facilitates the characterisation of bot and human behaviour in GitHub repositories, by enabling the analysis of activity sequences and activity patterns of bot and human contributors. This dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.
Files description
The following files are provided as part of the archive:
Example
Below is an example of a Closing pull request activity:
{
"date": "2022-11-25T18:49:09+00:00",
"activity": "Closing pull request",
"contributor": "typescript-bot",
"repository": "DefinitelyTyped/DefinitelyTyped",
"comment": {
"length": 249,
"GH_node": "IC_kwDOAFz6BM5PJG7l"
},
"pull_request": {
"id": 62328,
"title": "[qunit] Add `test.each()`",
"created_at": "2022-09-19T17:34:28+00:00",
"status": "closed",
"closed_at": "2022-11-25T18:49:08+00:00",
"merged": false,
"GH_node": "PR_kwDOAFz6BM4_N5ib"
},
"conversation": {
"comments": 19
},
"payload": {
"pr_commits": 1,
"pr_changed_files": 5
}
}
List of activity types
In total, we have identified 24 different high-level activity types from 15 different low-level event types. They are Creating repository, Creating branch, Creating tag, Deleting tag, Deleting repository, Publishing a release, Making repository public, Adding collaborator to repository, Forking repository, Starring repository, Editing wiki page, Opening issue, Closing issue, Reopening issue, Transferring issue, Commenting issue, Opening pull request, Closing pull request, Reopening pull request, Commenting pull request, Commenting pull request changes, Reviewing code, Commenting commits, Pushing commits.
List of fields
Not only does the dataset contain a list of activities made by bot and human contributors, but it also contains some details about these activities. For example, commenting issue activities provide details about the author of the comment, the repository and issue in which the comment was created, and so on.
For all activity types, we provide the date of the activity, the contributor that made the activity, and the repository in which the activity took place. Depending on the activity type, additional fields are provided. In this section, we describe for each activity type the different fields that are provided in the JSON file. It is worth to mention that we also provide the corresponding JSON schema alongside with the datasets.
Properties
string
string
string
string
object
integer
string
string
string
string
, null
boolean
string
object
integer
string
string
string
string
, null
boolean
string
object
string
string
object
integer
Demo to save data from a Space to a Dataset. Goal is to provide reusable snippets of code.
Documentation: https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads Space: https://huggingface.co/spaces/Wauplin/space_to_dataset_saver/ JSON dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-json Image dataset: https://huggingface.co/datasets/Wauplin/example-commit-scheduler-image Image (zipped) dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Wauplin/example-space-to-dataset-image-zip.