Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.
All current development and additional community extensions can be found at https://github.com/kratzert/Caravan
IMPORTANT: Due to size limitations for individual repositories, the netCDF version and the CSV version of Caravan (since Version 1.6) are split into two different repositories. You can find the netCDF version at https://zenodo.org/records/14673536
Channel Log:
As discussed in http://bit.ly/wardpost, the City of Chicago changed to a new ward map on 5/18/2015, affecting some datasets. This ZIP file contains CSV exports from 5/15/2015 of all datasets except Crimes - 2001 to present. Due to size limitations, that CSV is at https://data.cityofchicago.org/d/5wdx-rdkp. These CSV files contain the final or close-to-final versions of the datasets with the previous ("2003") ward values.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Contains csv data of cell features used for the analysis in the publication: "A novel MYH9 variant leads to atypical Epstein-Fechtner syndrome by altering non-muscle myosin IIA mediated contractile processes". These csv files contain call relevant cell features per patient and cell type. Files should be titled: For controls: + + .csv For patients: + + + + .csv Metadata containing sex and age is also available in files: “controls_metadata.csv” and “patients_metadata.csv” Summary statistic is also included in this public dataset. For controls: “controls_summary_statistics.csv” For patients: “patients_summary_statistics.csv” Summary statistic files are created using publicly available code: code: https://github.com/SaraKaliman/dc-data-novel-MYH9-variant/blob/main/Step1_summary_statistics.ipynb Group analysis included t-test, U-test and effect size for t-test and can be found in the file: “summary_statistical_group_analysis.csv” file. Main figure in the article and statistical analysis are done using publicly available code: https://github.com/SaraKaliman/dc-data-novel-MYH9-variant/blob/main/Step2_group_comparison.ipynb Single scalar rtdc files is included only due to limitation of DCOR datasets to rtdc files.
We provide MATLAB binary files (.mat) and comma separated values files of data collected from a pilot study of a plug load management system that allows for the metering and control of individual electrical plug loads. The study included 15 power strips, each containing 4 channels (receptacles), which wirelessly transmitted power consumption data approximately once per second to 3 bridges. The bridges were connected to a building local area network which relayed data to a cloud-based service. Data were archived once per minute with the minimum, mean, and maximum power draw over each one minute interval recorded. The uncontrolled portion of the testing spanned approximately five weeks and established a baseline energy consumption. The controlled portion of the testing employed schedule-based rules for turning off selected loads during non-business hours; it also modified the energy saver policies for certain devices. Three folders are provided: “matFilesAllChOneDate” provides a MAT-file for each date, each file has all channels; “matFilesOneChAllDates” provides a MAT-file for each channel, each file has all dates; “csvFiles” provides comma separated values files for each date (note that because of data export size limitations, there are 10 csv files for each date). Each folder has the same data; there is no practical difference in content, only the way in which it is organized.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset used for the publication "Coddora: CO2-based Occupancy Detection model
trained via DOmain RAndomization". The goal is to provide training data for occupancy detection.
The dataset contains one million days of data including 10 occupied days for each of 100,000 randomized room models (50,000 rooms considering office activity and 50,000 meeting room activity). Data were generated in EnergyPlus simulations according to the methodology described in the paper.
When using the dataset, please cite:
Manuel Weber, Farzan Banihashemi, Davor Stjelja, Peter Mandl, Ruben Mayer, and Hans-Arno Jacobsen. 2024. Coddora: CO2-Based Occupancy Detection Model Trained via Domain Randomization. In International Joint Conference on Neural Networks (IJCNN). June 30 - July 5, 2024, Yokohama, Japan.
The following files are provided:
1. dataset_office_rooms.h5 (provided as zip file)
2. dataset_meeting_rooms.h5 (provided as zip file)
3. simulated_occupancy_office_rooms.csv
4. simulated_occupancy_meeting_rooms.csv
Please use an archiving tool such as 7zip to unzip the hdf5 files.
Both hdf5 files contain two datasets with the following keys:
1. "data": contains the simulated indoor climate and occupancy data
2. "metadata": contains the metadata that were used for each simulation
The csv files contain the time series of occupancy that were used for the simulations.
Data includes the following fields:
Datetime: day of the year (may be relevant due to seasonal differences) and time of the day
Zone Air CO2 Concentration: CO2 level in ppm
Zone Mean Air Temperature: temperature in °C
Zone Air Relative Humidity: relative humidity in %
Occupancy: level of occupancy relative to the maximum capacity of the room (in the range [0-1])
Ventilation: fraction of window opening in the range [0.01, 1]
SimID: foreign key to reference the room properties the simulation was based on
BinaryOccupancy: 0 or 1 denoting absence or presence (for binary classification)
Example row:
Datetime | Zone Air CO2 Concentration | Zone Mean Air Temperature | Zone Air Relative Humidity | Occupancy | Ventilation | simID | BinaryOccupancy |
---|---|---|---|---|---|---|---|
10/09 11:21:00 |
1084.5624647371608 |
24.545635909907148 |
41.18393114737054 |
0.7 |
0.0 | 99 | 1 |
Metadata includes the following fields.
Underscores denote that the field was not selected during randomization but calculated from the other values.
width: room width in m
length: room length in m
height: hoom height in m
infiltration: infiltration per exterior area in m³/m²s
outdoor_co2: co2 concentration in the outdoor air in ppm (set to a random value between [300, 500])
orientation: angle between the room's facade orientation and the north direction in degrees
maxOccupants: room occupation limit, i.e. the maximum number of occupants
_floorArea: floor area in m² (calculated from room dimensions)
_volume: room volume in m³ (calculated from room dimensions)
_exteriorSurfaceArea: surface area of the facade wall (calculated from room dimensions)
_winToFloorRatio: ratio between total window area and floor area (calculated from room model)
firstDayUsedOfOccupancySequence: selected starting day in the sequence of occupancy data for rooms with the respective maxOccupants value
simID: unique identifier of the simulation to relate between simulation metadata and resulting simulated data
Example row:
width | length | height | infiltration | outdoor_co2 | orientation | maxOccupants | _floorArea | _volume | _exteriorSurfaceArea | _winToFloorRatio | firstDayOfUsedOccupancySequence | simID |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5.481 | 5.190 | 3.264 | 0.000214 | 438.0 | 316.0 | 4.0 | 28.446 | 92.849 | 16.940 | 0.216 | 192 | 0 |
The occupancy data provided through the separate csv files contain the data from the upfront occupancy simulations that the climate simulation was based on. For each level of considered room occupancy limit (maxOccupants), the datasets provide minute values of occupancy throughout 1000 days.
Datetime, Date, Timestamp: fictive time of simulated occupancy record (sequences are in 1-minute resolution)
Occupants: number of present occupants
Occupancy: binary occupancy state (0=unoccupied, 1=occupied)
WindowState: binary state of ventilation (0=windows closed, 1=room is ventilated)
maxOccupants: maximum number of occupants considered for the simulated sequence
WindowOpeningFraction: fractional extent to which windows are opened, within the interval [0.01, 1]
Example row:
Datetime | Date | Timestamp | Occupants | Occupancy | WindowState | maxOccupants | WindowOpeningFraction |
---|---|---|---|---|---|---|---|
2023-01-01 00:00:00 | 2023-01-01 | 1.672531e+09 | 0 | 0 | 0 | 1 | 0.0 |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data presented here were used to produce the following paper:
Archibald, Twine, Mthabini, Stevens (2021) Browsing is a strong filter for savanna tree seedlings in their first growing season. J. Ecology.
The project under which these data were collected is: Mechanisms Controlling Species Limits in a Changing World. NRF/SASSCAL Grant number 118588
For information on the data or analysis please contact Sally Archibald: sally.archibald@wits.ac.za
Description of file(s):
File 1: cleanedData_forAnalysis.csv (required to run the R code: "finalAnalysis_PostClipResponses_Feb2021_requires_cleanData_forAnalysis_.R"
The data represent monthly survival and growth data for ~740 seedlings from 10 species under various levels of clipping.
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes)
File 2: Herbivory_SurvivalEndofSeason_march2017.csv (required to run the R code: "FinalAnalysisResultsSurvival_requires_Herbivory_SurvivalEndofSeason_march2017.R"
The data consist of one .csv file with the following column names:
treatment Clipping treatment (1 - 5 months clip plus control unclipped) plot_rep One of three randomised plots per treatment matrix_no Where in the plot the individual was placed species_code First three letters of the genus name, and first three letters of the species name uniquely identifies the species species Full species name sample_period Classification of sampling period into time since clip. status Alive or Dead standing.height Vertical height above ground (in mm) height.mm Length of the longest branch (in mm) total.branch.length Total length of all the branches (in mm) stemdiam.mm Basal stem diameter (in mm) maxSpineLength.mm Length of the longest spine postclipStemNo Number of resprouting stems (only recorded AFTER clipping) date.clipped date.clipped date.measured date.measured date.germinated date.germinated Age.of.plant Date measured - Date germinated newtreat Treatment as a numeric variable, with 8 being the control plot (for plotting purposes) genus Genus MAR Mean Annual Rainfall for that Species distribution (mm) rainclass High/medium/low
File 3: allModelParameters_byAge.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
Age.of.plant Age in days species_code Species pred_SD_mm Predicted stem diameter in mm pred_SD_up top 75th quantile of stem diameter in mm pred_SD_low bottom 25th quantile of stem diameter in mm treatdate date when clipped pred_surv Predicted survival probability pred_surv_low Predicted 25th quantile survival probability pred_surv_high Predicted 75th quantile survival probability species_code species code Bite.probability Daily probability of being eaten max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species duiker_sd standard deviation of bite diameter for a duiker for this species max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species kudu_sd standard deviation of bite diameter for a kudu for this species mean_bite_diam_duiker_mm mean etc duiker_mean_sd standard devaition etc mean_bite_diameter_kudu_mm mean etc kudu_mean_sd standard deviation etc genus genus rainclass low/med/high
File 4: EatProbParameters_June2020.csv (required to run the R code: "FinalModelSeedlingSurvival_June2021_.R"
Consists of a .csv file with the following column headings
shtspec species name
species_code species code
genus genus
rainclass low/medium/high
seed mass mass of seed (g per 1000seeds)
Surv_intercept coefficient of the model predicting survival from age of clip for this species
Surv_slope coefficient of the model predicting survival from age of clip for this species
GR_intercept coefficient of the model predicting stem diameter from seedling age for this species
GR_slope coefficient of the model predicting stem diameter from seedling age for this species
species_code species code
max_bite_diam_duiker_mm Maximum bite diameter of a duiker for this species
duiker_sd standard deviation of bite diameter for a duiker for this species
max_bite_diameter_kudu_mm Maximum bite diameer of a kudu for this species
kudu_sd standard deviation of bite diameter for a kudu for this species
mean_bite_diam_duiker_mm mean etc
duiker_mean_sd standard devaition etc
mean_bite_diameter_kudu_mm mean etc
kudu_mean_sd standard deviation etc
AgeAtEscape_duiker[t] age of plant when its stem diameter is larger than a mean duiker bite
AgeAtEscape_duiker_min[t] age of plant when its stem diameter is larger than a min duiker bite
AgeAtEscape_duiker_max[t] age of plant when its stem diameter is larger than a max duiker bite
AgeAtEscape_kudu[t] age of plant when its stem diameter is larger than a mean kudu bite
AgeAtEscape_kudu_min[t] age of plant when its stem diameter is larger than a min kudu bite
AgeAtEscape_kudu_max[t] age of plant when its stem diameter is larger than a max kudu bite
As part of a review of the Solar Planning Exemptions set out in the Planning and Development Regulations 2001, the Department in conjunction with relevant statutory stakeholders (namely the Irish aviation Authority (IAA), Department of Defence and the HSE) considered the impact of glint and/or glare from solar panels on aviation receptors. Having regard to the potential glint and/or glare impact on aviation receptors, the designation of Solar Safeguarding Zones (SSZs) around certain airports (5km zone), aerodromes/ military barracks (3km zone), emergency helipads (3km zone) was required in order to provide appropriate safeguards in close proximity to aviation sites.
43 SSZs were introduced within which a rooftop limit on solar panels continues to apply:
The geographical area of the Solar Safeguarding Zones are delineated and defined by Statute in Schedule 1 (a map or maps of the areas) and Schedule 2 (a list of the townlands in question / a description of the areas) of the Planning and Development (Solar safeguarding Zone) Regulation 2022 (S.I. No. 492 of 2022).
The maps are also available to view in more detail on a non-statutory basis on myplan.ie
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
City population size is a crucial measure when trying to understand urban life. Many socio-economic indicators scale superlinearly with city size, whilst some infrastructure indicators scale sublinearly with city size. However, the impact of size also extends beyond the city’s limits. Here, we analyse the scaling behaviour of cities beyond their boundaries by considering the emergence and growth of nearby cities. Based on an urban network from African continental cities, we construct an algorithm to create the region of influence of cities. The number of cities and the population within a region of influence are then analysed in the context of urban scaling. Our results are compared against a random permutation of the network, showing that the observed scaling power of cities to enhance the emergence and growth of cities is not the result of randomness. By altering the radius of influence of cities, we observe three regimes. Large cities tend to be surrounded by many small towns for small distances. For medium distances (above 114 km), large cities are surrounded by many other cities containing large populations. Large cities boost urban emergence and growth (even more than 190 km away), but their scaling power decays with distance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
****** UPDATE 05/015/2025
We have increased the number of basins with observational data from 3,166 to 5,188.
In addition, we have added water level data alongside streamflow measurements.
The hourly streamflow and water level data for a total of 5,188 USGS gauges are stored in individual NetCDF files and packaged together in the Hourly.7z archive. The dataset covers the period from 1980-01-01 00:00:00 to 2024-12-31 23:00:00. Missing values are indicated by NaN.
DOI: https://doi.org/10.5281/zenodo.15413207
****** UPDATE 05/01/2025
ERA5-Land forcings can be downloaded here: https://doi.org/10.5281/zenodo.15264814
******
The current version of the CAMELSH dataset, containing data for 9,008 basins,. Due to the total data volume in the repository being approximately 57 GB, which exceeds Zenodo's size limit, we split it into two different links. The first link (https://doi.org/10.5281/zenodo.15066778) contains data on attributes, shapefiles, and time series data for the first set of basins. The second link (https://doi.org/10.5281/zenodo.14889025) contains forcing (time series) data for the the remaining basins. All data is compressed in 7zip format. After extraction, the dataset is organized into the following subfolders:
• The attributes folder contains 28 CSV (comma-separated values) files that store basin attributes with all files beginning with "attributes_" and one excel file. Of these, the 'attributes_nldas2_climate.csv' file contains nine climate attributes (Table 2) derived from NLDAS-2 data. The 'attributes_hydroATLAS.csv' file includes 195 basin attributes derived from the HydroATLAS dataset. 26 files with names starting with 'attributes_gageii_' contain a total of 439 basin attributes extracted from the GAGES-II dataset. The name of each file represents a distinct group of attributes, as described in Table S.1. The remaining file, named 'Var_description_gageii.xlsx', provides explanatory details regarding the variable names included in the 26 CSV files, with information similar to that presented in Table S.1. The first column in all CSV files, labeled 'STAID', contains the identification (ID) names of the stream gauges. These IDs are assigned by the USGS and are sourced from the original GAGES-II dataset.
• The shapefiles folder contains two sets of shapefiles for the catchment boundary. The first set, CAMELSH_shapefile.shp, is derived from the original GAGES-II dataset and is used to obtain the corresponding climate forcing data for each catchment. The second set, CAMELSH_shapefile_hydroATLAS.shp, includes catchment boundaries derived from the HydroATLAS dataset. Each polygon in both shapefiles contains a field named GAGE_ID, which represents the ID of the stream gauges.
• The timeseries (7zip) file contains a compressed archive (7zip) that includes time series data for 3,166 basins with observed streamflow data. Within this 7zip file, there are a total of 3,166 NetCDF files, each corresponding to a specific basin. The name of each NetCDF file matches the stream gauge ID. Each file contains an hourly time series from 1980-01-01 00:00:00 to 2024-12-31 23:00:00 for streamflow (denoted as "Streamflow" in the NetCDF file) and 11 climate variables (see Table 1). The streamflow data series includes missing values, which are represented as "NaN". All meteorological forcing data and streamflow records have been standardized to the +0 UTC time zone.
• The timeseries_nonobs (7zip) file contains time series data for the remaining 5,842 basins. The structure of each NetCDF file is similar to the one described above.
• The info.csv file, located in the main directory of the dataset, contains basic information for 9,008 stream stations. This includes the stream gauge ID, the total number of observed hourly data points over 45 years (from 1980 to 2024), and the number of observed hourly data points for each year from 1980 to 2024. Stations with and without observed data are distinguished by the value in the second column, where stations without observed streamflow data have a corresponding value of 0.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset consists of three files: a file with behaviour data (events.csv), a file with item properties (item_properties.сsv) and a file, which describes category tree (category_tree.сsv). The data has been collected from a real-world ecommerce website. It is raw data, i.e. without any content transformations, however, all values are hashed due to confidential issues. The purpose of publishing is to motivate researches in the field of recommender systems with implicit feedback.
The behaviour data, i.e. events like clicks, add to carts, transactions, represent interactions that were collected over a period of 4.5 months. A visitor can make three types of events, namely “view”, “addtocart” or “transaction”. In total there are 2 756 101 events including 2 664 312 views, 69 332 add to carts and 22 457 transactions produced by 1 407 580 unique visitors. For about 90% of events corresponding properties can be found in the “item_properties.csv” file.
For example:
The file with item properties (item_properties.csv) includes 20 275 902 rows, i.e. different properties, describing 417 053 unique items. File is divided into 2 files due to file size limitations. Since the property of an item can vary in time (e.g., price changes over time), every row in the file has corresponding timestamp. In other words, the file consists of concatenated snapshots for every week in the file with the behaviour data. However, if a property of an item is constant over the observed period, only a single snapshot value will be present in the file. For example, we have three properties for single item and 4 weekly snapshots, like below:
timestamp,itemid,property,value
1439694000000,1,100,1000
1439695000000,1,100,1000
1439696000000,1,100,1000
1439697000000,1,100,1000
1439694000000,1,200,1000
1439695000000,1,200,1100
1439696000000,1,200,1200
1439697000000,1,200,1300
1439694000000,1,300,1000
1439695000000,1,300,1000
1439696000000,1,300,1100
1439697000000,1,300,1100
After snapshot merge it would looks like:
1439694000000,1,100,1000
1439694000000,1,200,1000
1439695000000,1,200,1100
1439696000000,1,200,1200
1439697000000,1,200,1300
1439694000000,1,300,1000
1439696000000,1,300,1100
Because property=100 is constant over time, property=200 has different values for all snapshots, property=300 has been changed once.
Item properties file contain timestamp column because all of them are time dependent, since properties may change over time, e.g. price, category, etc. Initially, this file consisted of snapshots for every week in the events file and contained over 200 millions rows. We have merged consecutive constant property values, so it's changed from snapshot form to change log form. Thus, constant values would appear only once in the file. This action has significantly reduced the number of rows in 10 times.
All values in the “item_properties.csv” file excluding "categoryid" and "available" properties were hashed. Value of the "categoryid" property contains item category identifier. Value of the "available" property contains availability of the item, i.e. 1 means the item was available, otherwise 0. All numerical values were marked with "n" char at the beginning, and have 3 digits precision after decimal point, e.g., "5" will become "n5.000", "-3.67584" will become "n-3.675". All words in text values were normalized (stemming procedure: https://en.wikipedia.org/wiki/Stemming) and hashed, numbers were processed as above, e.g. text "Hello world 2017!" will become "24214 44214 n2017.000"
The category tree file has 1669 rows. Every row in the file specifies a child categoryId and the corresponding parent. For example:
Retail Rocket (retailrocket.io) helps web shoppers make better shopping decisions by providing personalized real-time recommendations through multiple channels with over 100MM unique monthly users and 1000+ retail partners over the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PatagoniaMet v1.0 (PMET from here on) is a new dataset for Western Patagonia that consists of two datasets: i) PMET-obs, a compilation of quality-controlled ground-based hydrometeorological data, and ii) PMET-sim, a daily gridded product of precipitation, and maximum and minimum temperature. PMET-obs was developed using a 4-step quality control process applied to 523 hydro-meteorological time series (precipitation, air temperature, potential evaporation, streamflow and lake level stations) obtained from eight institutions in Chile and Argentina. Based on this dataset and currently available uncorrected gridded products (in this case ERA5), PMET-sim was developed using statistical bias correction procedures (i.e. quantile mapping), spatial regression models (random forest) and hydrological methods (Budyko framework). Details are given below.
The streamflow metadata file (Q_PMETobs_version_metadata.csv) contains more than just the location data. Following current guidelines for hydrological datasets, the upstream area corresponding to each stream gauge was delimited (.shp file in Basins_PMETobs_version.zip), and several climatic and geographic attributes were derived. The details of the attributes can be found in the README file. For the basins that were part of the hydrological modelling (and that achieved a Kling-Gupta efficiency greater than 0.5), the file Q_PMETobs_version_water_balance.csv is attached, which contains the water balance for each basin estimated for the period 1985-2019.
Citation: Aguayo, R., León-Muñoz, J., Aguayo, M., Baez-Villanueva, O., Fernandez, A. Zambrano-Bigiarini, M., and Jacques-Coper, M. (2023) PatagoniaMet: A multi-source hydrometeorological dataset for Western Patagonia. Sci Data 11, 6 (2024). https://doi.org/10.1038/s41597-023-02828-2
Code repository: https://github.com/rodaguayo/PatagoniaMet
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/48QC7Bhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/48QC7B
This replication dataset includes code and data to replicate the paper "Communication networks do not predict success in attempts at peer production". The data included are of three types: 1. A zipped tar file of compressed XML files of edits made to wikis. This includes the full text of every revision made to the 1430 wikis that were part of our analysis as of early 2010 (different wikis were collected at different times). Note: Due to the Dataverse's file size limit, this file is in two parts - wiki_com_networks-wiki_dump.tar.xz.partaa and wiki_com_networks-wiki_dump.tar.xz.partab To combine them run: cat wiki_com_networks-wiki_dump.tar.xz.part* > wiki_com_networks-wiki_dump.tar.xz 2. A zipped tar file of the wikiq TSV files with metadata about each edit, created using the wikiq parser (https://code.communitydata.science/mediawiki_dump_tools.git). Those wishing to convert the XML files into TSV files can use the wikiq parser. 3. Summary CSV files with data about the communication network and activity levels for each wiki---in other words, the data used for the analyses in the paper. Code for converting the TSV files into these summary CSV files is included. A more detailed description of how to replicate the figures and analyses from the paper is given in the README file included with the code.
Temperature data in Celsius degrees collected approximately every 30 minutes since 2017 by 19 sensors distributed on Garibaldi Street in Lyon and elsewhere in the Metropolis (see sensor location data). These sensors were installed as part of the European biotope project (https://www.grandlyon.com/metropole/affaires-europeennes/biotope) by the Métropole de Lyon. They run on battery and some no longer transmit data (see inactive sensors in sensor location data). Attention: data transmission is expected to stop by the end of 2024, depending on the battery level of each sensor and the scheduled shutdown of the LoRa data transmission network. Downloading this large data (more than 1500000 unit data) in CSV format will require a specific query to limit the file size and be able to open it. For example: — sensor data N°70b3d580a0100648 can be downloaded via the link: https://download.data.grandlyon.com/ws/timeseries/biotope.temperature/all.csv?field=deveui&value=70b3d580a0100648&maxfeatures=-1 — the 1000000 unit data from the 700 000th element can be downloaded in CSV via the link: https://download.data.grandlyon.com/ws/timeseries/biotope.temperature/all.csv?maxfeatures=1000000&start=700000 Refer to the Documentation of the platform for more details (filter by sensor number, date range, etc.): https://rdata-grandlyon.readthedocs.io/fr/latest/
This data provides results from the California Environmental Data Exchange Network (CEDEN) for field and lab chemistry analyses. The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.
Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data.
Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.
NOTE: Some of the field and lab chemistry data that has been submitted to CEDEN since 2020 has not been loaded into the CEDEN database. That data is not included in this data set (and is also not available via the CEDEN query tool described above), but is available as a supplemental data set available here: Surface Water - Chemistry Results - CEDEN Augmentation. For consistency, many of the conditions applied to the data in this dataset and in the CEDEN query tool are also applied to that supplemental dataset (e.g., no rejected data or replicates are included), but that supplemental data is provisional and may not reflect all of the QA/QC controls applied to the regular CEDEN data available here.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
PURPOSE: To provide a permanent repository of key data series necessary to build a range-wide American eel stock assessment. DESCRIPTION: This collection presents data associated with the following report: Cairns, D.K. 2020. Landings, abundance indicators, and biological data for a potential range-wide American eel stock assessment. Canadian Data Report of Fisheries and Aquatic Science. No. 1311: v + 180 pp. Much of the data collection is from the Atlantic Provinces of Canada, particularly the Southern Gulf of St. Lawrence. The collection also includes data from elsewhere in the American eel's range in Canada, and also the United States and the Caribbean Basin. Files in the collection are as follows. Cairns2020_AnnexA_ReportTables.xlsx: This Excel file (file size 756 kb) contains all 37 tables in Cairns (2020) exactly as they appear in the report. Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx: This Excel file (file size 3.1 mb) contains 20,047 records of American eel lengths and other biological data from the Canadian Atlantic Provinces, 1983-2017. Records include weights of 8,915 eels and ages of 2,212 eels. Records of 3,224 electrofishing sessions in the Miramichi River, New Brunswick, 1952-2019, and records of 2,590 electrofishing sessions in the Restigouche River, New Brunswick, 1972-2019 are included. Cairns2020_AnnexC_EelLengthsAgesDataDefinitions.csv: This .csv file (file size 4 kb) gives data definitions in English and French for the table of eel lengths and other biological data that is contained in Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx and in Cairns2020_AnnexD_EelLengthsAges.csv. Cairns2020_AnnexD_EelLengthsAges.csv: This file (file size 2.0 mb) presents in .csv format the table of eel lengths and other biological data that is also presented in Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx. Cairns2020_AnnexE_EelEFishingDataDefinitions.csv: This .csv file (file size 2 kb) gives data definitions in English and French for the table of eel electrofishing data that is contained in Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx and in Cairns2020_AnnexD_EelLengthsAges.csv. Cairns2020_AnnexF_EelEFishing.csv: This file (file size 314 kb) presents in .csv format the table of eel electrofishing data that is also presented in Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx. Cairns2020_AnnexG_OtolithImageMetadata.csv: This .csv file (file size 2 kb) provides metadata for the collection of eel otolith images. Files with names starting with EelOtos . . . . : These .tif, .jpg, and .bmp image files are in zipped format with a summed size of 5.3 gb. The files give magnified photos of 1,838 eel otoliths that have been prepared for age reading. Samples are from the Atlantic Provinces of Canada. Individual otolith codes in Cairns2020_AnnexB_EelLengthsAgesEfishingRecords.xlsx and in Cairns2020_AnnexC_EelLengthsAgesDataDefinitions.csv match the codes embedded in otolith image filenames. PARAMETERS COLLECTED: American eel landings, number caught, and effort of commercial and research fishing gear. American eel lengths, ages, sex and other biological data and sampling locations. NOTES ON QUALITY CONTROL: All keypunched records of landings, densities, and other data were verified against original sources. Landings and abundance indices were reviewed in a Department of Fisheries and Oceans scientific workshop and corrected as necessary. Length and age data were examined by length-weight and length age plots and implausible records were discarded. PHYSICAL SAMPLE DETAILS: No physical samples SAMPLING METHODS: Landings are from government fisheries agencies. Abundance indices are from commercial fyke, spear, and trap catch per unit effort, and from research ladder counts and electrofishing records. Mean elver lengths are compiled from published literature Sex ratios are compiled from published literature Locations of biological and genetic sampling are compiled from published literature American eel lengths are total length of live specimens. Ages are from otolith annulus readings Electrofishing records are from backpack electrofishing surveys in wadeable waters USE LIMITATION: To ensure scientific integrity and appropriate use of the data, we would encourage you to contact the data custodian.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For further information, see article.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Based on length frequency data of miter squid (Uroteuthis chinensis) collected in the northeastern South China Sea in 1975–1977, 1997–1999, and 2018–2019, asymptotic length, optimal length at first capture, relative mortality, and relative biomass of the stock were estimated using length-based Bayesian biomass estimation (LBB). The LBB-estimated asymptotic length for 2018–2019 was smaller. Optimal lengths at first capture for the later far exceeded average lengths in catches because of a major increase in fishing intensity. Between 1975 and 1977, relative total mortality (Z/K) was low, but it increased in the latter two periods, while relative natural mortality (M/K) showed a downward trend. Relative biomasses (B/B0 and B/Bmsy) indicated that the stock was close to unexploited between 1975 and 1977, but they declined to the levels of 6% and 4% in the later periods, which correspond to growth in fishing horsepower. Indeed, by 2018, fishing horsepower increased by nearly four times the optimal level. The analysis suggests that the stock of miter squid has been overfished since the mid-1980s and is now under heavy fishing pressure. To recover the stock, it is imperative to reduce fishing intensity and enforce size-at-first-capture regulations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Oceanic diel vertical migration (DVM) constitutes the daily movement of various mesopelagic organisms migrating vertically from depth to feed in shallower waters and return to deeper water during the day. Accurate classification of taxa that participate in DVM remains non-trivial, and there can be discrepancies between methods. DEEPEND consortium (www.deependconsortium.org) scientists have been characterizing the diversity and trophic structure of pelagic communities in the northern Gulf of Mexico (nGoM). Profiling has included scientific echosounders to provide accurate and quantitative estimates of organismal density and timing as well as quantitative net sampling of micronekton. The use of environmental DNA (eDNA) can detect uncultured microbial taxa and the remnants that larger organisms leave behind in the environment. eDNA offers the potential to increase understanding of the DVM and the organisms that participate. Here we used real-time shipboard echosounder data to direct the sampling of eDNA in seawater at various time-points during the ascending and descending DVM. This approach allowed the observation of shifts in eDNA profiles concurrent with the movement of organisms in the DVM as measured by acoustic sensors. Seawater eDNA was sequenced using a high-throughput metabarcoding approach. Additionally, fine-scale acoustic data using an autonomous multifrequency echosounder was collected simultaneously with the eDNA samples and changes in organism density in the water column were compared with changes in eDNA profiles. Our results show distinct shifts in eukaryotic taxa such as copepods, cnidarians, and tunicates, over short timeframes during the DVM. These shifts in eDNA track changes in the depth of sound scattering layers (SSLs) of organisms and the density of organisms around the CTD during eDNA sampling. Dominant taxa in eDNA samples were mostly smaller organisms that may be below the size limit for acoustic detection, while taxa such as teleost fish were much less abundant in eDNA data compared to acoustic data. Overall, these data suggest that eDNA, may be a powerful new tool for understanding the dynamics and composition of the DVM, yet challenges remain to reconcile differences among sampling methodologies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Body size is a trait of fundamental ecological and evolutionary importance that is often different between males and females (sexual size dimorphism; SSD). The island rule predicts that small-bodied species tend to evolve larger following a release from interspecific competition and predation in insular environments. According to Rensch’s rule, male body size relative to female body size increases with increasing mean body size. This allometric body size – SSD scaling is explained by male-driven body size evolution. These ecogeographical rules are rarely tested within species, and has not been addressed in a cave–surface context, even though caves represent insular environments (small and isolated with simple communities). By analyzing six cave and nine surface populations of the widespread, primarily surface-dwelling freshwater isopod Asellus aquaticus with male-biased SSD, we tested whether cave populations evolved larger and showed higher SSD than the surface populations. We found extensive between-population variation in body size (maximum divergence being 74%) and SSD (males being 15%–50% larger than females). However, habitat type did not explain the body size and SSD variation and we could not reject isometry in the male–female body size relationship. Hence, we found no support for the island or Rensch’s rules. We conclude that local selective forces stemming from environmental factors other than island vs. mainland or the general surface vs. cave characteristics are responsible for the reported population variation.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Correction - 8 October 2020 An error has been found involving Figure 2.9: Projected population change, council area, mid-2018 to mid-2028. The ‘percentage change’ value for Scotland was entered in error and has now been corrected (from 4.4% to 1.8%). The council figures are unaffected. Corrections have been made to the ‘All Sections’ and ‘Population’ data tables (Excel and CSV) files. Maximum file size: 3 MB
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the accompanying dataset to the following paper https://www.nature.com/articles/s41597-023-01975-w
Caravan is an open community dataset of meteorological forcing data, catchment attributes, and discharge daat for catchments around the world. Additionally, Caravan provides code to derive meteorological forcing data and catchment attributes from the same data sources in the cloud, making it easy for anyone to extend Caravan to new catchments. The vision of Caravan is to provide the foundation for a truly global open source community resource that will grow over time.
If you use Caravan in your research, it would be appreciated to not only cite Caravan itself, but also the source datasets, to pay respect to the amount of work that was put into the creation of these datasets and that made Caravan possible in the first place.
All current development and additional community extensions can be found at https://github.com/kratzert/Caravan
IMPORTANT: Due to size limitations for individual repositories, the netCDF version and the CSV version of Caravan (since Version 1.6) are split into two different repositories. You can find the netCDF version at https://zenodo.org/records/14673536
Channel Log: