CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This database contains the full provenance data for the OpenCitations Meta database, provided as a dump of the Virtuoso database. It includes a complete history of the creation and modification of every entity.The provenance information is modelled in RDF according to the OpenCitations Data Model. Each change to an entity (creation, deletion, modification, or merge) is recorded in a "snapshot", which includes metadata about the agent responsible for the change, the primary source, and the timestamp. Snapshots are linked sequentially to provide a full version history of each entity.This dump contains:7,544,653,255 quadruples1,307,782,669 snapshotsA full-text index for textual searches.The data is provided as a multi-part 7zip archive. To extract it, please use the provided extraction scripts (extract_archive.sh
for Linux/macOS and extract_archive.bat
for Windows). Usage example (Linux/macOS):bash extract_archive.sh oc_meta_prov_06_06.7z.001 ./extracted_data
Usage example (Windows):extract_archive.bat oc_meta_prov_06_06.7z.001 .\extracted_data
For more information, please refer to the following paper: Arcangelo Massari, Fabio Mariani, Ivan Heibi, Silvio Peroni, David Shotton; OpenCitations Meta. Quantitative Science Studies 2024; 5 (1): 50–75. doi: https://doi.org/10.1162/qss_a_00292
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The most vulnerable group of traffic participants are pedestrians using mobility aids. While there has been significant progress in the robustness and reliability of camera based general pedestrian detection systems, pedestrians reliant on mobility aids are highly underrepresented in common datasets for object detection and classification.
To bridge this gap and enable research towards robust and reliable detection systems which may be employed in traffic monitoring, scheduling, and planning, we present this dataset of a pedestrian crossing scenario taken from an elevated traffic monitoring perspective together with ground truth annotations (Yolo format [1]). Classes present in the dataset are pedestrian (without mobility aids), as well as pedestrians using wheelchairs, rollators/wheeled walkers, crutches, and walking canes. The dataset comes with official training, validation, and test splits.
An in-depth description of the dataset can be found in [2]. If you make use of this dataset in your work, research or publication, please cite this work as:
@inproceedings{mohr2023mau,
author = {Mohr, Ludwig and Kirillova, Nadezda and Possegger, Horst and Bischof, Horst},
title = {{A Comprehensive Crossroad Camera Dataset of Mobility Aid Users}},
booktitle = {Proceedings of the 34th British Machine Vision Conference ({BMVC}2023)},
year = {2023}
}
Archive mobility.zip contains the full detection dataset in Yolo format with images, ground truth labels and meta data, archive mobility_class_hierarchy.zip contains labels and meta files (Yolo format) for training with class hierarchy using e.g. the modified version of Yolo v5/v8 available under [3].
To use this dataset with Yolo, you will need to download and extract the zip archive and change the path entry in dataset.yaml to the directory where you extracted the archive to.
[1] https://github.com/ultralytics/ultralytics
[2] coming soon
[3] coming soon
https://brightdata.com/licensehttps://brightdata.com/license
Unlock valuable biomedical knowledge with our comprehensive PubMed Dataset, designed for researchers, analysts, and healthcare professionals to track medical advancements, explore drug discoveries, and analyze scientific literature.
Dataset Features
Scientific Articles & Abstracts: Access structured data from PubMed, including article titles, abstracts, authors, publication dates, and journal sources. Medical Research & Clinical Studies: Retrieve data on clinical trials, drug research, disease studies, and healthcare innovations. Keywords & MeSH Terms: Extract key medical subject headings (MeSH) and keywords to categorize and analyze research topics. Publication & Citation Data: Track citation counts, journal impact factors, and author affiliations for academic and industry research.
Customizable Subsets for Specific Needs Our PubMed Dataset is fully customizable, allowing you to filter data based on publication date, research category, keywords, or specific journals. Whether you need broad coverage for medical research or focused data for pharmaceutical analysis, we tailor the dataset to your needs.
Popular Use Cases
Pharmaceutical Research & Drug Development: Analyze clinical trial data, drug efficacy studies, and emerging treatments. Medical & Healthcare Intelligence: Track disease outbreaks, healthcare trends, and advancements in medical technology. AI & Machine Learning Applications: Use structured biomedical data to train AI models for predictive analytics, medical diagnosis, and literature summarization. Academic & Scientific Research: Access a vast collection of peer-reviewed studies for literature reviews, meta-analyses, and academic publishing. Regulatory & Compliance Monitoring: Stay updated on medical regulations, FDA approvals, and healthcare policy changes.
Whether you're conducting medical research, analyzing healthcare trends, or developing AI-driven solutions, our PubMed Dataset provides the structured data you need. Get started today and customize your dataset to fit your research objectives.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the input data for the panel regression and SEIR analyses for the paper "Weather drives variation in COVID-19 transmission and detection", as well as the results of the SEIR calibrations.
Under the `inputs` directory, the `panel_all.csv` data provides the compiled data for the core analyses. For information on COVID cases, we use daily reports from the John Hopkins University's COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) (Dong et al. 2020). We complement these data with a variety of online sources on confirmed cases at the subnational level.
For information on weather, we use ERA 5 reanalysis (Hersbach et al. 2018). The data were downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store in January 2021.
The governance data (in `inputs/governance`) is drawn from https://info.worldbank.org/governance/wgi/ for the year 2019.
The mapping is done using GADM 3.6, simplified to reduce its size, and stored in `inputs/gadm36_levels_simple`.
The result data in the `results` directory represents calibrations of the Bayesian SEIR model:
- Files of the form `epimodel-meta-0314...-nodel.csv` include parameter estimates both as initially calibrated (group = "Raw") and after the meta-analysis (group = "Combined"). Paramater estimates are represented by their mean, standard deviation, and five quantiles. Different files of this form reflect different assumptions: (1) `-noweather` exclude weather effects; `-noprior` exclude panel regression priors, and `-full3` include weather and priors; (2) `-all` were estimated using all observations; `-mobile` were estimated using only region/observations with mobility data; (3) `-nobs` include meta-analyzed data weighted by the number of observations; `-pop` are weighted by population; and `-region` are weighted by region.
- The `global-0314.RData` file contains calibrations treating the whole world as one region.
- The `pairwise.csv` and `pairwise-all.csv` compare data from regions that have valid no-prior (suffix 1) and no-weather (suffix 2) projections. `pairwise.csv` has summary statistics while `pairwise-all.csv` includes every region.
Acknowledgements:
The results contain modified Copernicus Climate Change Service information 2020. Neither the European Commission nor ECMWF is responsible for any use that may be made of the Copernicus information or data it contains.
Bibliography:
Dong E, Du H, Gardner L (2020): An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1
Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2018): ERA5 hourly data on single levels from 1959 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). (Accessed in January 2021), 10.24381/cds.adbb2d47
1.The use of biologging devices continues to increase, with technological advances yielding remarkable ecological insights and generating new research questions. However, as devices develop and are deployed more widely, there is a need to update our knowledge of the potential ethical impacts to allow scientists to balance these against the knowledge gained.
2.We employed a suite of phylogenetically controlled meta-analyses on a dataset comprising more than 450 published effect sizes across 214 different studies to examine the effects of biologger tagging on five key traits in birds.
3.Overall, we found small but significant negative effects of tagging on survival, reproduction, parental care. In addition, tagging was positively associated with foraging trip duration, but had no effect on body mass. Meta-regressions revealed that flying style, migration distance and proportional tag mass were significant influences producing these deleterious effects, with attachment type and posit...
This data set under CC-BY license contains time series of total abundance and/or biomass of assemblages of insect, arachnid and Entognatha assemblages (grouped at the family level or higher taxonomic resolution), monitored by standardized means for ten or more years. The data were derived from 166 data sources, representing a total of 1676 sites from 41 countries. The time series for abundance and biomass represent the aggregated number of all individuals of all taxa monitored at each site. The data set consists of four linked tables, representing information on the study level, the plot level, about sampling, and the measured assemblage sizes. all references to the original data sources can be found in the pdf with references, and a Google Earth file (kml) file presents the locations (including metadata) of all datasets. When using (parts of) this data set, please respect the original open access licenses. This data set underlies all analyses performed in the paper 'Meta-analysis reveals declines in terrestrial, but increases in freshwater insect abundances', a meta-analysis of changes in insect assemblage sizes, and is accompanied by a data paper entitled 'InsectChange – a global database of temporal changes in insect and arachnid assemblages'. Consulting the data paper before use is recommended. Tables that can be used to calculate trends of specific taxa and for species richness will be added as they become available. The data set consists of four tables that are linked by the columns 'DataSource_ID'. and 'Plot_ID', and a table with references to original research. In the table 'DataSources', descriptive data is provided at the dataset level: Links are provided to online repositories where the original data can be found, it describes whether the dataset provides data on biomass, abundance or both, the invertebrate group under study, the realm, and describes the location of sampling at different geographic scales (continent to state). This table also contains a reference column. The full reference to the original data is found in the file 'References_to_original_data_sources.pdf'. In the table 'PlotData' more details on each site within each dataset are provided: there is data on the exact location of each plot, whether the plots were experimentally manipulated, and if there was any spatial grouping of sites (column 'Location'). Additionally, this table contains all explanatory variables used for analysis, e.g. climate change variables, land-use variables, protection status. The table 'SampleData' describes the exact source of the data (table X, figure X, etc), the extraction methods, as well as the sampling methods (derived from the original publications). This includes the sampling method, sampling area, sample size, and how the aggregation of samples was done, if reported. Also, any calculations we did on the original data (e.g. reverse log transformations) are detailed here, but more details are provided in the data paper. This table links to the table 'DataSources' by the column 'DataSource_ID'. Note that each datasource may contain multiple entries in the 'SampleData' table if the data were presented in different figures or tables, or if there was any other necessity to split information on sampling details. The table 'InsectAbundanceBiomassData' provides the insect abundance or biomass numbers as analysed in the paper. It contains columns matching to the tables 'DataSources' and 'PlotData', as well as year of sampling, a descriptor of the period within the year of sampling (this was used as a random effect), the unit in which the number is reported (abundance or biomass), and the estimated abundance or biomass. In the column for Number, missing data are included (NA). The years with missing data were added because this was essential for the analysis performed, and retained here because they are easier to remove than to add. Linking the table 'InsectAbundanceBiomassData.csv' with 'PlotData.csv' by column 'Plot_ID', and with 'DataSources.csv' by column 'DataSource_ID' will provide the full dataframe used for all analyses. Detailed explanations of all column headers and terms are available in the ReadMe file, and more details will be available in the forthcoming data paper. WARNING: Because of the disparate sampling methods and various spatial and temporal scales used to collect the original data, this dataset should never be used to test for differences in insect abundance/biomass among locations (i.e. differences in intercept). The data can only be used to study temporal trends, by testing for differences in slopes. The data are standardized within plots to allow the temporal comparison, but not necessarily among plots (even within one dataset).
Randomised and quasi-randomised controlled trials of brief lifestyle interventions delivered at any stage during pregnancy, and across the BMI spectrum, were included. Studies of that included pregnant women diagnosed with any complications that might affect diet or physical activity behaviours were excluded. Eligible interventions had to be ‘brief’, where the intervention could be delivered during a routine point of contact (face to face or via telephone) (Werch et al., 2006). An inclusive approach to study selection was taken. Interventions could be delivered over more than one point of contact if the duration was kept intentionally brief and could realistically be delivered within a national healthcare system, without requiring significant expansion of workforce or training. For one intervention where duration of contact between participants and the healthcare practitioner was unclear, the study was retained for the purpose of the review (Jeffries, Shub, Walker, Hiscock, & Permezel, 2009). Comparator groups in the eligible trials needed to be a standard care control group. Interventions had to report on the effectiveness of changing energy balance behaviours (either diet, physical activity and/or weight monitoring behaviours) in pregnant women. The primary outcome of interest from the brief interventions was total GWG in kilograms, reported as the change in weight from first point of entry into the antenatal care pathway (i.e. baseline) to just before delivery (at variable time points in the third trimester). Meta-analyses were conducted on GWG as a continuous outcome (in kg) and as a binary outcome (proportion of pregnant women exceeding IOM GWG guidelines). Mean differences in total GWG in kilograms between the intervention and control groups were calculated for studies that reported continuous outcomes. In studies that compared the brief intervention to a more intense intervention group, only the comparison against standard care was taken forward for quantitative pooling. For all dichotomous outcomes, odds ratios for the likelihood of exceeding IOM-recommended GWG were calculated. Intention–to-treat data were used where reported by the individual studies. To estimate the overall pooled weighted mean effect size of the interventions, random effects models were chosen to allow for anticipated between-study variance (DerSimonian & Laird, 1986). Subgroup analyses were conducted, comparing interventions for women who entered pregnancy with overweight or obesity (BMI >25 kg/m2) compared to interventions delivered to women across the BMI spectrum. Further subgroup analyses by risk of bias and the brief intervention delivery strategy were also undertaken. For meta-analysis, assessment of between-study heterogeneity was judged by the p-value for heterogeneity and calculation of the I2 value. Significance of subgroup and sensitivity analysis was judged by the p value for heterogeneity (Higgins & Green, 2008). P-values of <0.05 were considered statistically significant. All statistical analyses were undertaken in Stata 15/SE (StataCorp, 2017). These are the datasets used for the meta-analysis.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
AeroSonicDB (YPAD-0523): Labelled audio dataset for acoustic detection and classification of aircraftVersion 1.1.2 (November 2023)
[UPDATE: June 2024]
Version 2.0 is currently in beta and can be found at https://zenodo.org/records/12775560. The repository is currently restricted, however you can gain access by emailing Blake Downward at aerosonicdb@gmail.com, or by submitting the following Google Form.
Version 2 vastly extends the number of Aircraft audio samples to over 3,000 (V1 contains 625 aircraft sampes), for more than 38 hours of strongly annotated aircraft audio (V1 contains 8.9 hours of aircraft audio).
Publication
When using this data in an academic work, please reference the dataset DOI and version. Please also reference the following paper which describes the methodology for collecting the dataset and presents baseline model results.
Downward, B., & Nordby, J. (2023). The AeroSonicDB (YPAD-0523) Dataset for Acoustic Detection and Classification of Aircraft. ArXiv, abs/2311.06368.
Description
AeroSonicDB:YPAD-0523 is a specialised dataset of ADS-B labelled audio clips for research in the fields of environmental noise attribution and machine listening, particularly acoustic detection and classification of low-flying aircraft. Audio files in this dataset were recorded at locations in close proximity to a flight path approaching or departing Adelaide International Airport's (ICAO code: YPAD) primary runway, 05/23. Recordings are initially labelled from radio (ADS-B) messages received from the aircraft overhead, then human verified and annotated with the first and final moments which the target aircraft is audible.
A total of 1,895 audio clips are distributed across two top-level classes, "Aircraft" (8.87 hours) and "Silence" (3.52 hours). The aircraft class is then further broken-down into four subclasses, which broadly describe the structure of the aircraft and propulsion mechanism. A variety of additional "airframe" features are provided to give researchers finer control of the dataset, and the opportunity to develop ontologies specific to their own use case.
For convenience, the dataset has been split into training (10.04 hours) and testing (2.35 hours) subsets, with the training set further split into 5 distinct folds for cross-validation. These splits are performed to prevent data-leakage between folds and the test set, ensuring samples collected in the same recording session (distinct in time, location and microphone) are assigned to the same fold.
Researchers may find applications for this dataset in a number of fields; particularly aircraft noise isolation and noise monitoring in an urban environment, development of passive acoustic systems to assist radar technology, and understanding the sources of aircraft noise to help manufacturers design less-noisy aircraft.
Audio data
ADS-B (Automatic Dependent Surveillance–Broadcast) messages transmitted directly from aircraft are used to automatically trigger, capture and label audio samples. A 60-second recording is triggered when an aircraft transmits a message indicating it is within a specified distance of the recording device (see "Location data" below for specifics). The resulting audio file is labelled with the unique ICAO identifier code for the aircraft, as well as its last reported altitude, date, time, location and microphone. The recording is then human verified and annotated with timestamps for the first and last moments the aircraft is audible. In total, AeroSonicDB contains 625 recordings of low-altitude aircraft - varying in length from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio.
A collection of urban background noise without aircraft (silence) is included with the dataset as a means of distinguishing location specific environmental noises from aircraft noises. 10-second background noise, or "silence" recordings are triggered only when there are no aircraft broadcasting they are within a specified distance of the recording device (see "Location data" below). These "silence" recordings are also human verified to ensure no aircraft noise is present. The dataset contains 1,270 clips of silence/urban background noise.
Location data
Recordings have been collected from three (3) locations. GPS coordinates for each location are provided in the "locations.json" file. In order to protect privacy, coordinates have been provided for a road or public space nearby the recording device instead of its exact location.
Location: 0Situated in a suburban environment approximately 15.5km north-east of the start/end of the runway. For Adelaide, typical south-westerly winds bring most arriving aircraft past this location on approach. Winds from the north or east will cause aircraft to take-off to the north-east, however not all departing aircraft will maintain a course to trigger a recording at this location. The "trigger distance" for this location is set for 3km to ensure small/slower aircraft and large/faster aircraft are captured within a sixty-second recording.
"Silence" or ambient background noises at this location include; cars, motorbikes, light-trucks, garbage trucks, power-tools, lawn mowers, construction sounds, sirens, people talking, dogs barking and a wide range of Australian native birds (New Holland Honeyeaters, Wattlebirds, Australian Magpies, Australian Ravens, Spotted Doves, Rainbow Lorikeets and others).
Location: 1Situated approximately 500m south-east of the south-eastern end of the runway, this location is nearby recreational areas (golf course, skate park and parklands) with a busy road/highway inbetween the location and runway. This location features heavy winds and road traffic, as well as people talking, walking and riding, and also birds such as the Australian Magpie and Noisy Miner. The trigger distance for this location is set to 1km. Due to their low altitude aircraft are louder, but audible for a shorter time compared to "Location 0".
Location: 2As an alternative to "Location 1", this location is situated approximately 950m south-east of the end of the runway. This location has a wastewater facility to the north, a residential area to the south and a popular beach to the west. This location offers greater wind protection and further distance from airport and highway noises. Ambient background sounds feature close proximity cars and motorbikes, cyclists, people walking, nail guns and other construction sounds, as well as the local birds mentioned above.
Aircraft metadata
Supplementary "airframe" metadata for all aircraft has been gathered to help broaden the research possibilities from this dataset. Airframe information was collected and cross-checked from a number of open-source databases. The author has no reason to beleive any significant errors exist in the "aircraft_meta" files, however future versions of this dataset plan to obtain aircraft information directly from ICAO (International Civil Aviation Organization) to ensure a single, verifiable source of information.
Class/subclass ontology (minutes of recordings)
no aircraft (211) 0: no aircraft (211)
aircraft (533) 1: piston-propeller aeroplane (30) 2: turbine-propeller aeroplane (90) 3: turbine-fan aeroplane (409) 4: rotorcraft (4) The subclasses are a combination of the "airframe" and "engtype" features. Piston and Turboshaft rotorcraft/helicopters have been combined into a single subclass due to the small number of samples. Data splits
Audio recordings have been split into training (81%) and test (19%) sets. The training set has further been split into 5 folds, giving researchers a common split to perform 5-fold cross-validation to ensure reproducibility and comparable results. Data leakage into the test set has been avoided by ensuring recordings are disjointed from the training set by time and location - meaning samples in the test set for a particular location were recorded after any samples included in the training set for that particular location.
Labelled data
The entire dataset (training and test) is referenced and labelled in the "sample_meta.csv" file. Each row contains a reference to a unique recording, its meta information, annotations and airframe features.
Alternatively, these labels can be derived directly from the filename of the sample (see below). The "aircraft_meta.csv" and "aircraft_meta.json" files can be used to reference aircraft specific features - such as; manufacturer, engine type, ICAO type designator etc. (see "Columns/Labels" below for all features).
File naming convention
Audio samples are in WAV format, with some metadata stored in the filename.
Basic Convention
"Aircraft ID + Date + Time + Location ID + Microphone ID"
"XXXXXX_YYYY-MM-DD_hh-mm-ss_X_X"
Sample with aircraft
{hex_id} _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
7C7CD0_2023-05-09_12-42-55_2_1.wav
Sample without aircraft
"Silence" files are denoted with six (6) leading zeros rather than an aircraft hex code. All relevant metadata for "silence" samples are contained in the audio filename, and again in the accompanying "sample_meta.csv"
000000 _ {date} _ {time} _ {location_id} _ {microphone_id} . {file_ext}
000000_2023-05-09_12-30-55_2_1.wav
Columns/Labels
(found in sample_meta.csv, aircraft_meta.csv/json files)
train-test: Train-test split (train, test)
fold: Digit from 1 to 5 splitting the training data 5 ways (else test)
filename: The filename of the audio recording
date: Date of the recording
time: Time of the recording
location: ID for the location of the recording
mic: ID of the microphone used
class: Top-level label for the recording (eg. 0 = No aircraft, 1 = Aircraft audible)
subclass: Subclass label for the recording (eg. 0 = No aircraft, 3 = Turbine-fan aeroplane)
altitude: Approximate altitude of the aircraft (in feet) at the start of the recording
hex_id: Unique ICAO 24-bit address for the aircraft recorded
session: Unique recording
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Recording setup
Recordings have been taken place on 12th March 2019. Gaze data has been recorded with a Tobii 4C eye tracker with Pro license at 90 Hz. Resolution of the viewport was set to 1024x768. The display had a size of 24 inches and a resolution of 1680x1050 pixels. We polled the DOM tree every 50 milliseconds for fixed elements. We recorded the Web browsing of four participants, who followed the protocol as stored under "Dataset_visual_change/Instructions.doc".
Description of the dataset
The dataset consists of following three subsets.
1. Dataset_visual_change
The recordings of each participant p1-p4 on twelve Web sites are in the corresponding directories. For each Web site, there are nine to ten files:
2. Dataset_stimuli
Stimulus shots and visual stimuli computed with the framework. Value-based, edge-based, signal-based, and SIFT-based features have been used. The labels of the first participant's session had been used to train a random forest classifier with 100 trees for visual change classification, using the named features. The discovery has been performed on each Web site from the dataset and
the results are placed in the respective directories. Inside each directory, there is one directory for the detected shots and one for the discovered stimuli. In the shots directory, there is one overview as
The shots have been merged to stimuli, which are placed in the stimuli directory. The stimuli are grouped per layer (scrollable, fixed elements, etc.) and meta information is available in
3. Dataset_evaluation
We have performed two evaluations of the visual stimuli discovery. One computational estimating the quality of stimuli. One case-study of an expert's task. There are two respective directories with the annotation data.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This comprehensive dataset contains detailed information about Magic: The Gathering cards, compiled from the Scryfall API. It includes over 90,000+ card entries with complete metadata, making it perfect for data analysis, machine learning projects, and MTG-related research.
mtg_cards_complete.csv
- Complete dataset including reprints and alternate versionsmtg_cards_unique.csv
- Unique cards only (removes duplicates by name)Column | Description |
---|---|
NAME | Card name |
MANA_COST | Mana cost symbols (e.g., "{3}{R}{R}") |
CMC | Converted Mana Cost (numeric) |
TYPE | Full type line (e.g., "Creature — Dragon") |
RARITY | Card rarity (Common, Uncommon, Rare, Mythic) |
CARD_TEXT | Complete card text/rules |
POWER_TOUGHNESS | Power/Toughness for creatures (e.g., "4/4") |
FIRST_EDITION | Release date of first printing |
NUMBER_OF_EDITIONS | Total number of sets where this card appears |
PRICES | Current market prices (USD/EUR/TIX) |
LEGALITIES | Legal formats (Standard, Modern, Legacy, etc.) |
COLOR_PIE | Color identity (W/U/B/R/G combinations) |
✅ Complete Coverage: All cards from Magic's history
✅ Clean Data: Processed text, standardized formats
✅ Current Prices: Real-time market data
✅ Rich Metadata: Comprehensive card information
✅ Multiple Formats: Both complete and unique versions
NAME: Lightning Bolt
MANA_COST: {R}
CMC: 1
TYPE: Instant
RARITY: Common
CARD_TEXT: Lightning Bolt deals 3 damage to any target.
POWER_TOUGHNESS:
FIRST_EDITION: 1993-08-05
NUMBER_OF_EDITIONS: 25+
COLOR_PIE: R
import pandas as pd
# Load the data
df = pd.read_csv('mtg_cards_unique.csv')
# Most expensive cards
expensive_cards = df.nlargest(10, 'PRICES')
# Color distribution
color_stats = df['COLOR_PIE'].value_counts()
# Rarity breakdown
rarity_dist = df['RARITY'].value_counts()
This is a snapshot dataset. For real-time data, consider using the Scryfall API directly. The collection methodology is included in the notebook for easy reproduction and updates.
Perfect for: Data scientists, Magic players, game designers, researchers, and anyone interested in trading card game analytics!
Keywords: #mtg #magic #trading-cards #games #collectibles #data-analysis #machine-learning
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Water scarcity is a challenge in arid regions across the world (Dolan et al., 2021), and is managed by a wide range of governance and institutional approaches (Olmstead, 2014; Berbel and Esteban, 2019). As climate change and competition for water between uses continues to add pressure to already water-stressed regions (Garrick et al., 2019; Caretta et a., 2022), managers, policy makers, and scientists are seeking alternative management strategies to the insufficient policies currently in place (e.g., Berbel and Esteban). One such region is the western U.S., where water stress has increased due to several factors including long-term drought (Williams et al. 2022), increasing competition between agricultural and urban water users (Garrick et al., 2019), and new valuation of in-stream flows (Lane and Rosenberg, 2019).The arid western U.S. began regulating water allocations during the gold rush period of the mid 1800's (Irwin v. Phillips, California 1855). During this time, water was essential for mining, and so the Prior Appropriation Doctrine for water allocation – which is largely still in use today – grew out of gold mining's system of prioritizing resource allocation based on the date when an individual or organization first laid a claim. This is known as "first in time, first in right", and establishes a system of seniority for water users. Following this 1855 ruling in California, all other western U.S. states (except Alaska) established their own forms of water regulation based in part or in whole on the Prior Appropriation Doctrine. We refer readers to Gopalakrishnan (1973) for a thorough history of the Prior Appropriation Doctrine in the U.S. West.Here we present a new database of western U.S. water rights records. We produced the water rights database presented here in 4 main steps: (1) data collection, (2) data quality control, (3) data harmonization, and (4) generation of cumulative water rights curves. Each of steps (1) - (3) had to be completed in order to produce (4), the final product that was used in the modeling exercise in Grogan et al. (in review). All data in each step is associated with a spatial unit called a Water Management Area (WMA), which is the unit of water right administration. Steps (2) and (3) required us to make assumptions and interpretations, and to remove records from the raw data collection. We describe each of these assumptions and interpretations, as well as go further in depth in methodological details in Lisk et al. (in review).This meta-record for the HarDWR database links to the original meta-record, which then links to the four distinct datasets that comprise the whole database: Harmonize Database of Western U.S.Water Rights (HarDWR). The four dataset that can be accessed are:HarDWR - Raw Water Rights Records: The collection of raw downloaded water right records, sourced from each state; step (1) above.HarDWR - Harmonized Water Rights Records: The harmonized water right records, by state; step (2) above.HarDWR - Cumulative Water Rights Curves: The calculated cumulative water rights curves, by state and by WMA; step (4) above.HarDWR - Water Management Area (WMA) Shapefiles: The spatial boundaries which are the administration unit of water rights for each state.CitationsAnderson, M. T. & Woosley, L. H. Water availability for the western United States: key scientific challenges. (U.S. Dept. of the Interior, U.S. Geological Survey ; For sale by U.S. Geological Survey, Information Services, 2005). https://pubs.usgs.gov/circ/2005/circ1261/pdf/C1261.pdfBerbel, J. & Esteban, E. Droughts as a catalyst for water policy change. Analysis of Spain, Australia (MDB), and California. Glob. Environ. Change 58, 101969 (2019). https://doi.org/10.1016/j.gloenvcha.2019.101969Caretta, M. A. et al. Water. in Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of Working Group II to the Sixth Assessment Report of the IPCC Cambridge University Press. (2022). https://doi.org/10.1017/9781009325844.006Carney, C. P., Endter‐Wada, J. & Welsh, L. W. The Accumulating Interest in Water Banks: Assessing Their Role in Mitigating Water Insecurities. JAWRA J. Am. Water Resour. Assoc. 57, 552–571 (2021). https://doi.org/10.1111/1752-1688.12940Dolan, F. et al. Evaluating the economic impact of water scarcity in a changing world. Nat. Commun. 12, 1915 (2021). https://doi.org/10.1038/s41467-021-22194-0Garrick, D. et al. Rural water for thirsty cities: a systematic review of water reallocation from rural to urban regions. Environ. Res. Lett. 14, 043003 (2019). https://doi.org/10.1088/1748-9326/ab0db7Gopalakrishnan, C. The Doctrine of Prior Appropriation and Its Impact on Water Development.: A Critical Survey. Am. J. Econ. Sociol. 32, 61–72 (1973). https://doi.org/10.1111/j.1536-7150.1973.tb02180.xGrogan, D. S. et al. Water balance model (WBM) v.1.0.0: a scalable gridded global hydrologic model with water-tracking functionality. Geosci. Model Dev. 15, 7287–7323 (2022). https://doi.org/10.5194/gmd-15-7287-2022Grogan, D. et al. Bringing hydrologic realism to water markets. (in review)Irwin v. Phillips. Cal. vol. 140 (1855). https://casetext.com/case/irwin-v-phillipsLane, B. A. & Rosenberg, D. E. Promoting In-Stream Flows in the Changing Western US. J. Water Resour. Plan. Manag. 146, 02519003 (2020). https://doi.org/10.1061/(ASCE)WR.1943-5452.0001145Lisk, M. et al. Harmonized Database of Western U.S Water Rights (HarDWR) v.1. (in review, paper for this database).Null, S. E. & Prudencio, L. Climate change effects on water allocations with season dependent water rights. Sci. Total Environ.571, 943–954 (2016). https://doi.org/10.1016/j.scitotenv.2016.07.081Olmstead, S. M. Climate change adaptation and water resource management: A review of the literature. Energy Econ. 46, 500–509 (2014). https://doi.org/10.1016/j.eneco.2013.09.005Tidwell, V. C. et al. Mapping water availability, projected use and cost in the western United States. Environ. Res. Lett. 9, 064009 (2014). https://doi.org/10.1088/1748-9326/9/6/064009Williams, A. P., Cook, B. I. & Smerdon, J. E. Rapid intensification of the emerging southwestern North American megadrought in 2020–2021. Nat. Clim. Change 12, 232–234 (2022). https://doi.org/10.1038/s41558-022-01290-z
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The cante100 dataset contains 100 tracks taken from the corpus. We defined 10 style families of which 10 tracks each are included. Apart from the style family, we manually annotated the sections of the track in which the vocals are present. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz ID.
Content:
README (5KB): Text file containing detailed descriptions of manual and automatic annotations.
meta-data (59KB): XML file containing meta-information: Source (anthology name, CD no. and track no.), editorial meta-data (artist name, title, style, musicBrainzID) and the manually annotated style family.
vocal sections (8.9MB): Text file (.csv) containing frame-wise vocal section annotations.
automatic transcriptions (375KB): Text files (.notes) and MIDI files (.mid) containing automatic note-level transcriptions of the singing voice.
Bark band energies (216.6MB): Text files (.csv) containing the frame-wise extracted bark band energies.
predominant melody (33.5MB): Text files (.csv) containing the frame-wise extracted predominant melody.
low-level descriptors (42.9MB): Text files (.csv) containing a set of frame-wise extracted low-level features.
MFCCs (97.1MB): Text files (.csv) containing the frame-wise extracted mel-frequency cepstral coefficients (MFCCs).
Magnitude spectrum (3.85GB): Text files (.csv) containing the frame-wise extracted magnitudes of the discrete fourier transform (DFT).
Publications
This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.
N. Kroher, J. M. Díaz-Báñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].
https://doi.org/10.1145/2875428
Conditions of use
The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.
© COFLA 2015. All rights reserved.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The cante2midi dataset contains 20 tracks taken from the corpus and includes a large variety of styles and complexity with respect to melodic ornamentation. We provide note-level transcriptions of the singing voice melody in a MIDI-like format, where each note is defined by onset time, duration and a quantized MIDI pitch. In addition, we provide a number of low-level descriptors and the fundamental frequency corresponding to the predominant melody for each track. The meta-information includes editoral meta-data and the musicBrainz IDs.
Content:
Publications
This work has been accepted for publication in the ACM Journal of Computation and Cultural heritage and is currently available in arXiv.
N. Kroher, J. M. Díaz-Báñez, J. Mora and E. Gómez (2015): Corpus COFLA: A research corpus for the Computational study of Flamenco Music. arXiv:1510.04029 [cs.SD cs.IR].
https://doi.org/10.1145/2875428
Conditions of use
The provided datasets are offered free of charge for internal non-commercial use. We do not grant any rights for redistribution or modification. All data collections were gathered by the COFLA team.
© COFLA 2015. All rights reserved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Observations of large mammals composed from camera traps. The monitoring array takes place in 11 monitoring units, while some of the larger units are further divided into subunits (different geographical regions) or habitats (different ecosystems). In some of the units the effect of proximity to man-made elements was also evaluated; in such cases there are also distinct sample sites near and far from the studied effect (e.g. settlements). Each unit x subunit x habitat x proximity combination usually contains 5 monitoring sites. In each such site, we positioned a transect of 900 m of 9 camera traps (with 100 m gaps) for about 10 days. All mammal photos were identified and grouped into observation events (represented by rows in the data file); an observation event is a set adjacent photos of the same species. Occupancy and activity levels estimates derived from HAMAARAG's large mammals monitoring program data play an important role in acting as indicators for trends in biodiversity, habitat change and climate change. New collaborations are extremely valuable to make the most of the data. Researchers are welcome to contact the dataset creator to collaborate on comparative analyses and meta-analysis.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Heritable variation is a prerequisite for evolutionary change. Yet, whether genetic potential for microevolution is relevant on macroevolutionary timescales is debated. Here we show that evolutionary divergence among populations, and to a lesser extent among species, increases with microevolutionary evolvability in both extant and extinct taxa. We evaluate and reject a number of hypotheses put forward to explain this relationship and propose that an effect of evolvability on population and species divergence can be explained by the influence of genetic constraints on population’s ability to track rapid stationary environmental fluctuations. Methods The data is collected from the primary scientific literature and consists of two independent meta datasets. One meta dataset contains contemporary populations and species and the other is comprised of fossil time series. The contemporary data is comprised of traits on a ratio scale with the requirements of having at least two populations (or species) means and one genetic variation estimate. The fossil data is retrieved from the database curated by Kjetil L. Voje: K. L. Voje, Phenotypic Evolution Time Series (PETS) Database, version 1.0 (2023). https://pets.nhm.uio.no
It is comprised of time series that follow one lineage through time, and the samples can be considered as populations sampled from the same lineage through time. We required one or more traits to be measured, with a minimum of two time steps. The trait was also required to be on ratio scale.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AirBase is the European air quality database maintained by the EEA through its European topic centre on Air pollution and Climate Change mitigation. It contains air quality monitoring data and information submitted by participating countries throughout Europe. The air quality database consists of a multi-annual time series of air quality measurement data and statistics for a number of air pollutants. It also contains meta-information on those monitoring networks involved, their stations and their measurements. The database covers geographically all EU Member States, the EEA member countries and some EEA collaborating countries. The EU Member States are bound under Decision 97/101/EC to engage in a reciprocal exchange of information (EoI) on ambient air quality. The EEA engages with its member and collaborating countries to collect the information foreseen by the EoI Decision because air pollution is a pan European issue and the EEA is the European body which produces assessments of air quality, covering the whole geographical area of Europe.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Updated cadastral information useful for planning, programming and monitoring the development of the regional territory and related economic activities.The data come from the Territory Agency through the interchange system integrated with other regional data related to the cadastral database and of which the Regional Administration has ownership.The stratum contains the trust points of the map sheet, both as geometry (symbol) and as associated text (progressive number of the point), suitable to be used as a reference for each cadastral update topographical survey.
The rate of decline in the global burden of avoidable maternal deaths has stagnated and remains an issue of concern in many sub-Saharan Africa countries. As per the most recent evidence, an average maternal mortality ratio (MMR) of 223 deaths per 100,000 live births has been estimated globally, with sub-Saharan Africa’s average MMR at 536 per 100,000 live births—more than twice the global average. Despite the high MMR, there is variation in MMR between and within sub-Saharan Africa countries. Differences in the behaviour of those accessing and/or delivering maternal healthcare may explain variations in outcomes and provide a basis for quality improvement in health systems. There is a gap in describing the landscape of interventions aimed at modifying the behaviours of those accessing and delivering maternal healthcare for improving maternal health outcomes in sub-Saharan Africa. Our objective was to extract and synthesise the target behaviours, component behaviour change strategies and outcomes of behaviour change interventions for improving maternal health outcomes in sub-Saharan Africa. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Our protocol was published a priori on PROSPERO (registration number CRD42022315130). We searched ten electronic databases (PsycINFO, Cochrane Database of Systematic Reviews, International Bibliography of Social Sciences, EMBASE, MEDLINE, Scopus, CINAHL PLUS, African Index Medicus, African Journals Online, and Web of Science) and included randomised trials and quasi-experimental studies. We extracted target behaviours and specified the behavioural interventions using the Action, Actor, Context, Time, and Target (AACTT) framework. We categorised the behaviour change strategies using the intervention functions described in the Behaviour Change Wheel (BCW). We reviewed 52 articles (26 randomized trials and 26 quasi-experimental studies). They had a mixed risk of bias. Out of these, 41 studies (78.8%) targeted behaviour change of those accessing maternal healthcare services, while seven studies (13.5%) focused on those delivering maternal healthcare. Four studies (7.7%) targeted mixed stakeholder groups. The studies employed a range of behaviour change strategies, including education 37 (33.3%), persuasion 20 (18%), training 19 (17.1%), enablement 16 (14.4%), environmental restructuring 8 (7.2%), modelling 6 (5.4%) and incentivisation 5 (4.5%). No studies used restriction or coercion strategies. Education was the most common strategy for changing the behaviour of those accessing maternal healthcare, while training was the most common strategy in studies targeting the behaviour of those delivering maternal healthcare. Of the 52 studies, 40 reported effective interventions, 7 were ineffective, and 5 were equivocal. A meta-analysis was not feasible due to methodological and clinical heterogeneity across the studies. In conclusion, there is evidence of effective behaviour change interventions targeted at those accessing and/or delivering maternal healthcare in sub-Saharan Africa. However, more focus should be placed on behaviour change by those delivering maternal healthcare within the health facilities to fast-track the reduction of the huge burden of avoidable maternal deaths in sub-Saharan Africa.
Animal movement is a fundamental process shaping ecosystems at multiple levels, from the fate of individuals to global patterns of biodiversity. The spatio-temporal dynamic of food resources is a major driver of animal movement and generates patterns ranging from range residency to migration and nomadism. Arctic tundra predators face a strongly fluctuating environment marked by cyclic microtine populations, high seasonality, and the potential availability of sea ice, which gives access to marine resources in winter. This type of relatively poor and highly variable environment can promote long-distance movements and resource tracking in mobile species. Here, we investigated the winter movements of the arctic fox, a major tundra predator often described as a seasonal migrant or nomad. We used six years of Argos satellite telemetry data collected on 66 adults from Bylot Island (Nunavut, Canada) tracked during the sea ice period. We hypothesized that long-distance movements would be influenced by spatio-temporal changes in resource availability and individual characteristics. Despite strong annual and seasonal changes in resource abundance and distribution, we found that a majority of individuals remained resident, especially those located in an area characterized by highly predictable pulse resources (goose nesting colony) and abundant cached food items (eggs). Foxes compensated terrestrial food shortage by commuting to the sea ice rather than using long-distance tracking or moving completely onto the sea ice for winter. Individual characteristics also influenced movement patterns: age positively influenced the propensity to engage in nomadism, suggesting older foxes may be driven out of their territories. Our results show how these mammalian predators can adjust their movement patterns to favor range residency despite strong spatio-temporal fluctuations in food resources. Understanding the movement responses of predators to prey dynamics helps identifying the scales at which they work, which is a critical aspect of the functioning and connectivity among meta-ecosystems.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This database contains the full provenance data for the OpenCitations Meta database, provided as a dump of the Virtuoso database. It includes a complete history of the creation and modification of every entity.The provenance information is modelled in RDF according to the OpenCitations Data Model. Each change to an entity (creation, deletion, modification, or merge) is recorded in a "snapshot", which includes metadata about the agent responsible for the change, the primary source, and the timestamp. Snapshots are linked sequentially to provide a full version history of each entity.This dump contains:7,544,653,255 quadruples1,307,782,669 snapshotsA full-text index for textual searches.The data is provided as a multi-part 7zip archive. To extract it, please use the provided extraction scripts (extract_archive.sh
for Linux/macOS and extract_archive.bat
for Windows). Usage example (Linux/macOS):bash extract_archive.sh oc_meta_prov_06_06.7z.001 ./extracted_data
Usage example (Windows):extract_archive.bat oc_meta_prov_06_06.7z.001 .\extracted_data
For more information, please refer to the following paper: Arcangelo Massari, Fabio Mariani, Ivan Heibi, Silvio Peroni, David Shotton; OpenCitations Meta. Quantitative Science Studies 2024; 5 (1): 50–75. doi: https://doi.org/10.1162/qss_a_00292