Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).
I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.
Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.
Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.
date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.
Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.
Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:
Deleted station location columns and lat/long as much of this data was already missing.
Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.
Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.
Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.
Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.
Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.
Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.
Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and scripts used for manuscript: High consistency and repeatability in the breeding migrations of a benthic shark.
Project title: High consistency and repeatability in the breeding migrations of a benthic shark
Date:23/04/2024
Folders:
- 1_Raw_data
- Perpendicular_Point_068151, Sanctuary_Point_068088, SST raw data, sst_nc_files, IMOS_animal_measurements, IMOS_detections, PS&Syd&JB tags, rainfall_raw, sample_size, Point_Perpendicular_2013_2019, Sanctuary_Point_2013_2019, EAC_transport
- 2_Processed_data
- SST (anomaly, historic_sst, mean_sst_31_years, week_1992_sst:week_2022_sst including week_2019_complete_sst)
- Rain (weekly_rain, weekly_rainfall_completed)
- Clean (clean, cleaned_data, cleaned_gam, cleaned_pj_data)
- 3_Script_processing_data
- Plots(dual_axis_plot (Fig. 1 & Fig. 4).R, period_plot (Fig. 2).R, sd_plot (Fig. 5).R, sex_plot (Fig. 3).R
- cleaned_data.R, cleaned_data_gam.R, weekly_rainfall_completed.R, descriptive_stats.R, sst.R, sst_2019b.R, sst_anomaly.R
- 4_Script_analyses
- gam.R, gam_eac.R, glm.R, lme.R, Repeatability.R
- 5_Output_doc
- Plots (arrival_dual_plot_with_anomaly (Fig. 1).png, period_plot (Fig.2).png, sex_arrival_departure (Fig. 3).png, departure_dual_plot_with_anomaly (Fig. 4).png, standard deviation plot (Fig. 5).png)
- Tables (gam_arrival_eac_selection_table.csv (Table S2), gam_departure_eac_selection_table (Table S5), gam_arrival_selection_table (Table. S3), gam_departure_selection_table (Table. S6), glm_arrival_selection_table, glm_departure_selection_table, lme_arrival_anova_table, lme_arrival_selection_table (Table S4), lme_departure_anova_table, lme_departure_selection_table (Table. S8))
Descriptions of scripts and files used:
- cleaned_data.R: script to extract detections of sharks at Jervis Bay. Calculate arrival and departure dates over the seven breeding seasons. Add sex and length for each individual. Extract moon phase (numerical value) and period of the day from arrival and departure times.
- IMOS_detections.csv: raw data file with detections of Port Jackson sharks over different sites in Australia.
- IMOS_animal_measurements.csv: raw data file with morphological data of Port Jackson sharks
- PS&Syd&JB tags: file with measurements and sex identification of sharks (different from IMOS, it was used to complete missing sex and length).
- cleaned_data.csv: file with arrival and departure dates of the final sample size of sharks (N=49) with missing sex and length for some individuals.
- clean.csv: completed file using PS&Syd&JB tags, note: tag ID 117393679 was wrongly identified as a male in IMOS and correctly identified as a female in PS&Syd&JB tags
file as indicated by its large size.
- cleaned_pj_data: Final data file with arrival and departure dates, sex, length, moon phase (numerical) and period of the day.
- weekly_rainfall_completed.R: script to calculate average weekly rainfall and correlation between the two weather stations used (Point perpendicular and Sanctuary point).
- weekly_rain.csv: file with the corresponding week number (1-28) for each date (01-06-2013 to 13-12-2019)
- weekly_rainfall_completed.csv: file with week number (1-28), year (2013-2019) and weekly rainfall average completed with Sanctuary Point for week 2 of 2017
- Point_Perpendicular_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Point Perpendicular weather station
- Sanctuary_Point_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Sanctuary Point weather station
- IDCJAC0009_068088_2017_Data.csv: Rainfall (mm) from 01-01-2017 to 31-12-2017 at the Sanctuary Point weather station (to fill in missing value for average rainfall of week 2 of 2017)
- cleaned_data_gam.R: script to calculate weekly counts of sharks to run gam models and add weekly averages of rainfall and sst anomaly
- cleaned_pj_data.csv
- anomaly.csv: weekly (1-28) average sst anomalies for Jervis Bay (2013-2019)
- weekly_rainfall_completed.csv: weekly (1-28) average rainfall for Jervis Bay (2013-2019_
- sample_size.csv: file with the number of sharks tagged (13-49) for each year (2013-2019)
- sst.R: script to extract daily and weekly sst from IMOS nc files from 01-05 until 31-12 for the following years: 1992:2022 for Jervis Bay
- sst_raw_data: folder with all the raw weekly (1:28) csv files for each year (1992:2022) to fill in with sst data using the sst script
- sst_nc_files: folder with all the nc files downloaded from IMOS from the last 31 years (1992-2022) at the sensor (IMOS - SRS - SST - L3S-Single Sensor - 1 day - night time – Australia).
- SST: folder with the average weekly (1-28) sst data extracted from the nc files using the sst script for each of the 31 years (to calculate temperature anomaly).
- sst_2019b.R: script to extract daily and weekly sst from IMOS nc file for 2019 (missing value for week 19) for Jervis Bay
- week_2019_sst: weekly average sst 2019 with a missing value for week 19
- week_2019b_sst: sst data from 2019 with another sensor (IMOS – SRS – MODIS - 01 day - Ocean Colour-SST) to fill in the gap of week 19
- week_2019_complete_sst: completed average weekly sst data from the year 2019 for weeks 1-28.
- sst_anomaly.R: script to calculate mean weekly sst anomaly for the study period (2013-2019) using mean historic weekly sst (1992-2022)
- historic_sst.csv: mean weekly (1-28) and yearly (1992-2022) sst for Jervis Bay
- mean_sst_31_years.csv: mean weekly (1-28) sst across all years (1992-2022) for Jervis Bay
- anomaly.csv: mean weekly and yearly sst anomalies for the study period (2013-2019)
- Descriptive_stats.R: script to calculate minimum and maximum length of sharks, mean Julian arrival and departure dates per individual per year, mean Julian arrival and departure dates per year for all sharks (Table. S10), summary of standard deviation of julian arrival dates (Table. S9)
- cleaned_pj_data.csv
- gam.R: script used to run the Generalized additive model for rainfall and sea surface temperature
- cleaned_gam.csv
- glm.R: script used to run the Generalized linear mixed models for the period of the day and moon phase
- cleaned_pj_data.csv
- sample_size.csv
- lme.R: script used to run the Linear mixed model for sex and size
- cleaned_pj_data.csv
- Repeatability.R: script used to run the Repeatability for Julian arrival and Julian departure dates
- cleaned_pj_data.csv
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
The Traveling Salesperson Problem (TSP) is a class problem of computer science that seeks to find the shortest route between a group of cities. It is an NP-hard problem in combinatorial optimization, important in theoretical computer science and operations research.
https://data.heatonresearch.com/images/wustl/kaggle/tsp/world-tsp.png" alt="World Map">
In this Kaggle competition, your goal is not to find the shortest route among cities. Rather, you must attempt to determine the route labeled on a map.
The data for this competition is not made up of real-world maps, but rather randomly generated maps of varying attributes of size, city count, and optimality of the routes. The following image demonstrates a relatively small map, with few cities, and an optimal route.
https://data.heatonresearch.com/images/wustl/kaggle/tsp/1.jpg" alt="Small Map">
Not all maps are this small, or contain this optimal a route. Consider the following map, which is much larger.
https://data.heatonresearch.com/images/wustl/kaggle/tsp/6.jpg" alt="Larger Map">
The following attributes were randomly selected to generate each image.
The path distance is based on the sum of the Euclidean distance of all segments in the path. The distance units are in pixels.
This is a regression problem, you are to estimate the total path length. Several challenges to consider.
The following picture shows a section from one map zoomed to the pixel-level:
https://data.heatonresearch.com/images/wustl/kaggle/tsp/tsp_zoom.jpg" alt="TSP Zoom">
The following CSV files are provided, in addition to the images.
The tsp-all.csv file contains the following data.
id,filename,distance,key
0,0.jpg,83110,503x673-270-83110.jpg
1,1.jpg,1035,906x222-10-1035.jpg
2,2.jpg,20756,810x999-299-20756.jpg
3,3.jpg,13286,781x717-272-13286.jpg
4,4.jpg,13924,609x884-312-13924.jpg
The columns:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1. Sample Dataset for Application of Proposed Methodology (data.csv). To protect patient confidentiality, the hospitals providing the example data used in this paper have not given permission for the data to be made publicly available. We have, however, included a limited “fake” version of the dataset. This dataset contains 3 variables - dlp.over indicates whether an exam is “high dose,” sizeC is an ID indicating the combination of anatomic area examined and patient size category, while fac is an ID indicating the hospital the exam was performed in. Information on which ID values are associated with which anatomic areas, patient sizes, and hospital will not be provided, as they are not necessary for the illustration of statistical methods described in the paper. Note that since the dataset made available is different from the dataset used in the paper, the results should be expected to be comparable, but not identical. The software implementing the methods described in this article is available on request from the author.
Facebook
TwitterPhenotypes (clutch size) to calculate heritabilityClutchSize_heritability_data.csvPedigree associated with the phenotype filePedigree.csvManipulated clutch sizes with fledgling weightsManipulated_Weights.csvFledgling weights and survival of chicksSurvival_Weights.csvData for calculating maternal effectMaternalEffect.csv
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
TechnicalInfo: Data used for training and validating MieAI model: Other: - mie_x1.csv and mie_x2.csv are outputs from Mie calculation for size parameter0.5, respectively. In both files, the columns are aerosol optical properties and physical characteristics of aerosol. The row represent different samples from Mie calculations. Other: - icon-art-LAM_DOM01_ML_0023.nc is the ICON-ART output for Biomass case study Other: - Soufriere-April-2021-fplume-aerodyn-forecast_mode_DOM03_ML_0162.nc is the ICON-ART output for volcano case study Other: - icon-art-aging-aero_DOM01_ML_0022.nc is the ICON-ART output for dust case study
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset is derived from the ISIC Archive with the following changes:
If the "benign_malignant" column is null and the "diagnosis" column is "vascular lesion", the target is set to null.
DISCLAIMER I'm not a dermatologist and I'm not affiliated with ISIC in any way. I don't know if my approach to setting the target value is acceptable by the ISIC competition. Use at your own risk.
import os
import multiprocessing as mp
from PIL import Image, ImageOps
import glob
from functools import partial
def list_jpg_files(folder_path):
# Ensure the folder path ends with a slash
if not folder_path.endswith('/'):
folder_path += '/'
# Use glob to find all .jpg files in the specified folder (non-recursive)
jpg_files = glob.glob(folder_path + '*.jpg')
return jpg_files
def resize_image(image_path, destination_folder):
# Open the image file
with Image.open(image_path) as img:
# Get the original dimensions
original_width, original_height = img.size
# Calculate the aspect ratio
aspect_ratio = original_width / original_height
# Determine the new dimensions based on the aspect ratio
if aspect_ratio > 1:
# Width is larger, so we will crop the width
new_width = int(256 * aspect_ratio)
new_height = 256
else:
# Height is larger, so we will crop the height
new_width = 256
new_height = int(256 / aspect_ratio)
# Resize the image while maintaining the aspect ratio
img = img.resize((new_width, new_height))
# Calculate the crop box to center the image
left = (new_width - 256) / 2
top = (new_height - 256) / 2
right = (new_width + 256) / 2
bottom = (new_height + 256) / 2
# Crop the image if it results in shrinking
if new_width > 256 or new_height > 256:
img = img.crop((left, top, right, bottom))
else:
# Add black edges if it results in scaling up
img = ImageOps.expand(img, border=(int(left), int(top), int(left), int(top)), fill='black')
# Resize the image to the final dimensions
img = img.resize((256, 256))
img.save(os.path.join(destination_folder, os.path.basename(image_path)))
source_folder = ""
destination_folder = ""
images = list_jpg_files(source_folder)
with mp.Pool(processes=12) as pool:
images = pool.map(partial(resize_image, destination_folder=destination_folder), images)
print("All images resized")
This code will shrink (down-sample) the image if it is larger than 256x256. But if the image is smaller than 256x256, it will add either vertical or horizontal black edges after scaling up the image. In both scenarios, it will keep the center of the input image in the center of the output image.
The HDF5 file is created using the following code:
import os
import pandas as pd
from PIL import Image
import h5py
import io
import numpy as np
# File paths
base_folder = "./isic-2020-256x256"
csv_file_path = 'train-metadata.csv'
image_folder_path = 'train-image/image'
hdf5_file_path = 'train-image.hdf5'
# Read the CSV file
df = pd.read_csv(os.path.join(base_folder, csv_file_path))
# Open an HDF5 file
with h5py.File(os.path.join(base_folder, hdf5_file_path), 'w') as hdf5_file:
for index, row in df.iterrows():
isic_id = row['isic_id']
image_file_path = os.path.join(base_folder, image_folder_path, f'{isic_id}.jpg')
if os.path.exists(image_file_path):
# Open the image file
with Image.open(image_file_path) as img:
# Convert the image to a byte buffer
img_byte_arr = io.BytesIO()
img.save(img_byte_arr, format=img.format)
img_byte_arr = img_byte_arr.getvalue()
hdf5_file.create_dataset(isic_id, data=np.void(img_byte_arr))
else:
print(f"Image file for {isic_id} not found.")
print("HDF5 file created successfully.")
To read the hdf5 file, use the following code:
import h5py
from PIL import Image
with h...
Facebook
TwitterData used for training and validating MieAI model:
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The data has been synthetically generated, calculated based on weightings from various studies and sites that currently compute the dependent variable, carbon emissions, attempting to maintain values close to reality. This dataset has been used in our Bootcamp final project: https://carbonfootprintcalculator.streamlit.app/
Features:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data files Cat_data_NZTM.csv GPS data for each of the Cats (labeled by id) Catdata.csv Home range sizes for each cat, Cat = cat id, sex is male or female, weight is in kg, 95KDE is the 95% kernal density estimate, 50KDE is the 50% kernel density estimate, rabbit level was low, medium, high based on Modified Mclean's scale run1.land.csv Landscape level characteristics calculated by fragstats run2.class.csv Class level characteristics calculated by fragstats
Code - all code is for R
Home range.R Code calculates home range and compares 95% KDE for males and females and 50% KDE for males and females Code used to calculate home range requires file CleanGPSdata Code used to compare home range size between males and females uses data Catdata.csv
Factors that affect habitat selection.R Requires data from Simple_landuse file and Cat_data_NZTM.csv for habitat selection Calulates habitat selection for all cats and for each individual cat Following calculating habitat selection the coeffiecients are used from the individual models with Catdata.csv, run1.land.csv and run2.class.csv to examine factors that affect habitat selection
All cats model validation.R Provides code for model validation of habitat selection models for all cats
Individual cat model validation.R Provides code for model validation of habitat selection models for each of the individual cats
Tortuosity veg vs pasture.R Provides code in caluculating tortuosities for paths in pasture and woody vegetation and code to determine if there was a difference in these
Step lengths veg vs pasture.R Provides code to determine if there was a difference between pasture and woody vegetation
Facebook
TwitterCompetition for mates can drive the evolution of exaggerated weaponry and male dimorphism associated with alternative reproductive tactics. In terrestrial arthropods, male dimorphism is often detected as non-linear allometries, where the scaling relationship between weapon size and body size differs in intercept and/or slope between morphs. Understanding the patterns of non-linear allometries is important as it can provide insights into threshold evolution and the strength of selection experienced by each morph. Numerous studies in male-dimorphic arthropods have reported that allometric slopes of weapons are shallower in large “major†males compared to small “minor†males. Because this pattern is common among beetles that undergo complete metamorphosis (holometabolous), researchers have hypothesized that the slope change reflects resource depletion during pupal development. However, no comprehensive survey has examined the generality of this trend. We systematically searched the literat..., , , # Are weapon allometries steeper in major or minor males? A meta-analysis
The dataset contains two Excel files and 2 R files.Â
meta_data.csv is the complete dataset used in the meta-regression analyses
meta_code.RÂ includes all R code used for the meta-regression analyses and production of figures
Compiling_data.xlsx reports the raw data extracted from each study used to calculate effect sizes (Hedges' g)
Calculating Hedges g.R reports the R code used for calculating effect sizes based on the compiled data
Citation: Author and year of the study included in the meta-analysis
Metamorphosis: Categorial variable indicating the type of metamorphosis of the study organism (holometabolous = complete metamorphosis; hemimetabolous = incomplete or no metamorphosis)
Order: Taxonomic rank of the study organism
Family: Taxonomic rank of the study organism
rotl_spp: Species nam...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: Sugarcane mill ash has been suggested as having high potential for carbon dioxide removal (CDR) via enhanced weathering (EW), but this had not been quantitatively assessed. The aims of this study were to 1) assess the CDR potential of various sugarcane mill ashes via EW, and 2) investigate the impact of soil conditions and mill ash properties on the CDR. This was done by characterising physical and chemical properties of five mill ashes from Australia and simulating CDR via EW using a one-dimensional reactive transport model
This data record contains:
This data includes measurements of the physical and chemical properties of mill ash samples including particle size distribution and morphology, surface elemental composition, total elemental composition, and mineralogy. This data was used to parameterise a modified version of the 1-D reactive transport model (RTM) developed by Kelland et al. (2020) in the geochemical modelling software PHREEQC. Therefore the data also includes R files and PDFs of calculations used to determine model parameters from the measurements of the mill ashes, input files for PHREEQC models in plain text and Microsoft Word format, database files in plain text format, and output files from PHREEQC in CSV and PQO formats. The output CSV files were then analysed to calculate carbon dioxide removal so the data includes those R scripts and derived PDF, CSV, and JPG files of final results.
Data is grouped into the following:
--//--
Software/equipment used to create/collect the data:
BET: Quantachrome ASiQwin - Automated Gas Sorption Data Acquisition and Reduction version 5.21
ICP-AES: Varian Liberty Series II
SEM-EDS: Jeol JXA8200 “Superprobe” with EDS and backscatter electron imaging (BEI)
TGA: TA STD650 Discovery Series- TGA-DSC (192.168.1.110), TA Instruments Trios v4.2.1.36612
XRD: Siemens D5000 Diffractometer (XRD) theta-2 theta goniometer with a copper anode x-ray tube
PHREEQC version 3.7.3 for Linux (https://www.usgs.gov/software/phreeqc-version-3) installed on the JCU High Performance Computer
Microsoft Word from Microsoft 365
Notepad++ version 8.1.9.2
Software/equipment used to manipulate/analyse the data:
R version 4.3.1 for Windows 10
R Studio version 2023.06.1 Build 524 for Windows 10
Microsoft Excel from Microsoft 365
Facebook
TwitterThe aim of the study was to obtain light and turbidity vertical profile data through the water column in Cleveland Bay over a four-day period and during maintenance dredging to use for developing an empirical spectral solar irradiance model. This dataset consists of 95 data files (spreadsheets). One file (.xlsx format) contains the spectral attenuation coefficient for downwelling light, Kd (lambda), for the 94 different vertical profiles taken within Cleveland Bay. In addition, individual raw data files from the USSIMO Hyperspectral Irradiance Sensor for each profile are provided (.csv format). Methods: Over a 4-d period (12–15 September 2016), and during a period of routine maintenance dredging, 94 light and turbidity vertical profiles were measured through the water column in Cleveland Bay using a USSIMO multispectral radiometer (In Situ Marine Optics, Perth, Australia) and IMO-NTU turbidity sensor. The USSIMO incorporates a Carl Zeiss UV/VIS miniature monolithic spectrometer module as the internal light recording device providing irradiance measurement values at nanometer spectral spacing. For each vertical profile the irradiance just-below surface incident irradiance Ed (0-, lambda) and the light attenuation coefficient Kd (lambda) were determined using the Beer-Lambert law. These data were used to calculate the wavelength specific light attenuation coefficients for downwelling light, Kd (lambda), using the relationship, E(lambda,z)=E(lambda,0) exp (-K_d (lambda),z) Where: E (lambda,z) is the spectral downwelling irradiance at depth z, and E (lambda,0) is the spectral downwelling irradiance just below the ocean’s surface. The accelerometer in the USSIMO was used to assess if the instrument was vertical and stable and the first 0.2 m of all deployments was discarded prior to calculating Kd. Kd values, along with the latitude and longitude for each station and the date and time of day the vertical profiling occurred can be found in the spreadsheet (.xlsx format). Each individual raw data file (.csv format) from the USSIMO sensor contains normalised Ed values (e.g. in-water Ed divided by above-water reference Ed), as well as the latitude and longitude for each station and the date and time of day the vertical profiling occurred. Limitations of the data: The value of NaN appears in the spreadsheet NewDataset1_ALL_Kd_DATA.xlsx for spectral attenuation coefficient for downwelling light, Kd (lambda). It describes values too low under highly turbid conditions to reliably calculate the Kd value. Format: This dataset contains a single Excel file with a size of 46 KB and 94 csv files ranging in size from 11 to 202 KB. Data Dictionary: Spreadsheet NewDataset1_ALL_Kd_DATA.xlsx Row headings include: Date, Local Time, Latitude, Longitude, Station ID, vertically averaged Kd (550 nm). The heading lambda(nm) [in cell A7] is noting that 400 to 700 below are wavelengths in nanometres (nm). Data Location: This dataset is filed in the eAtlas enduring data repository at: data esp2\2.1.9-Dredging-marine-response
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes reproducible code for the two applications related to the 'MethylDetectR' software. These .R files are included as 'MethylDetectR - Calculate Your Scores.R' and 'MethylDetectR.R'. An example DNAm file and SexAgeinfo file for upload to 'MethylDetectR - Calculate Your Scores' are included. These are 'DNAm_File_Example.rds' and 'SexAgeinfo_example.csv' respectively. An example output file from this application/for upload to 'MethylDetectR' is included as 'MethylDetectR - Test For Upload.csv'. An example and optional input file for case/control data is also available as 'MethylDetectR_Case_Control_Example.csv'.
Furthermore, a script for the user to generate their own DNAm-based estimated values for human traits is included as 'Script_For_User_To_Generate_Scores.R'. A necessary associated file as 'Predictors_Shiny_By_Groups.csv' is also present for the script to run.
Lastly, an additional file called 'Truncate_to_these_CpGs.csv' is available which allows users to subset their methylation file to those CpG sites used in the 'MethylDetectR - Calculate Your Scores' application. This may substantially reduce the size of the methylation file for upload as well as its upload time.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This study selected the relevant literature related to the adverse drug reactions of metformin from 1991 to 2020 as the data source, divided the time segment with a period of 3 years, and obtained the title information (see the title collection of the literature included in the study. zip), and then extracted the subject words through the bicomb2021 software to construct the co-occurrence matrix, and a total of 10 co-occurrence matrices were obtained (see the subject word co-occurrence matrix collection included in the study. zip). Import the 10 co-occurrence matrices into the self-designed python code and r code (see opportunity code. zip; trust code. zip; open triangle and closed triangle code. zip) to obtain the opportunity, trust value and the number of edge triangles of each node pair in the 10 networks. Use GePhi0.9.7 software to calculate the motivation value of the node pair, use Excel to calculate the global clustering coefficient of each network, and the edge clustering coefficient of each node pair, The number of edge triangles of each node pair is built by using excel software to construct the scatter diagram of node pair opportunity, trust, motivation value and node pair edge clustering coefficient, and the correlation between node pair opportunity value and edge clustering coefficient is calculated by using spss software, as well as the correlation between node pair trust, motivation value and edge clustering coefficient, and the number of closed triangles of node pair (see code operation and software calculation result set. zip).Select the literature bibliography data from 2000 to 2009 to build the panel data (see the literature bibliography collection included in the study. zip), and also use the self-designed python code and r code (see opportunity code. zip; trust code. zip; open triangle and closed triangle code. zip) to get the opportunity, trust value and the number of edge triangles of each node pair in 10 networks, and use GePhi0.9.7 software to calculate the motivation value of node pairs Proximity centrality, intermediary centrality, feature vector centrality and average path length of node pairs are imported into Stata/MP 17.0 software to obtain the correlation between node attributes and network characteristics (see code operation and software calculation result set. zip).The data contained in each data name is described in detail:1. Collection of bibliographies included in the studyThe data collection contains two folders, named the literature collection from 1991 to 2020 and the literature collection from 2000 to 2009. The literature collection from 1991 to 2020 stores the bibliographic data of 10 time periods from 1991 to 2020, and the literature collection from 2000 to 2009 stores the bibliographic data of 10 overlapping windows from 2000 to 2009.2. Co-occurrence matrix set of subject words included in the studyThe data set contains two folders, named the 1991-2020 subject word co-occurrence matrix set and the 2000-2009 subject word co-occurrence matrix set. The subject word co-occurrence matrix of 1991-2020 contains the subject word co-occurrence matrix of 10 time segments from 1991-2020. The first row and first column of each co-occurrence matrix are subject words, and the number represents the number of co-occurrence times of the subject word pair. The subject word co-occurrence matrix set in 2000-2009 stores the subject word co-occurrence matrix of 10 time windows in 2000-2009.3. Opportunity Code.zipThis code is used to calculate the opportunity value of node pair. The input data is co-occurrence matrix, and the input format is. csv format.4. Trust Code.zipThis code is used to calculate the opportunity value of node pair. The input data is co-occurrence matrix, and the input format is. csv format.5. Code of open triangle and closed triangle.zipThis code is used to calculate the number of closed triangles and open triangles on the side of the node pair. The input data is the co-occurrence matrix, and the input format is. csv format.6. Code run and software calculation result set.zipThe data set contains two folders, named 1991-2020 calculation results and 2000-2009 calculation results. The 1991-2020 calculation results store the calculation results and scatter diagrams of 10 time segments in 1991-2020. Take 1991-1993 as an example, the first row of each table is marked with the opportunity, comprehensive trust, motivation, edge clustering coefficient, and the number of closed triangles. At the end of each table, the mean value of opportunity, trust, motivation and Pearson correlation coefficient with edge clustering coefficient and the number of closed triangles are calculated.The 2000-2009 folder stores the panel data and the opportunity, trust, motivation of the stata software calculation, and the correlation between the node attributes and the network characteristics of the node.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We present mean individual weights for common benthic invertebrates of the Great Lakes collected from over 2,000 benthic samples and eight years of data collection (2012-2019), both as species-specific weights and average weights of larger taxonomic groups of interest. The dataset we have assembled is applicable to food web energy flow models, calculation of secondary production estimates, interpretation of trophic markers, and for understanding how biomass distribution varies by benthic invertebrate species in the Great Lakes. A corresponding data paper describes comparisons of these data to benthic invertebrates in other lakes. Methods Data Collection Benthic invertebrates were collected from the EPA R/V Lake Guardian from 2012-2019 as part of the EPA Great Lakes National Program Office GLBMP and Cooperative Science and Monitoring Initiative (CSMI) benthic surveys. GLBMP samples are collected in all five of the Great Lakes annually and CSMI samples are collected in one of the Great Lakes annually. GLBMP includes 57-63 stations each year: 11 in Lake Superior (and 2-7 additional stations since 2014), 11 in Lake Huron, 16 in Lake Michigan, 10 in Lake Erie, and 10 (9 since 2015) in Lake Ontario. The number of CSMI stations vary by year. CSMI surveys for each lake took place in the following years: Erie 2014 (97 stations), Michigan 2015 (140 stations), Superior 2016 (59 stations), Huron 2017 (118 stations), and Ontario 2018 (46 stations). Additional CSMI surveys have occurred since 2019, however, we did not include these survey data in our analysis because samples would be unbalanced with some lakes sampled twice and other lakes sampled only once. We followed EPA Standard Operating Procedures for Benthic Invertebrate Field Sampling SOP LG406 (U.S. EPA, 2021). In short, triplicate samples were collected from each station using a Ponar grab (sampling area = 0.0523 m2 for all surveys except Lake Michigan CSMI, for which sampling area = 0.0483 m2) then rinsed through 500 µm mesh. Samples were preserved with 5-10% neutral buffered formalin with Rose Bengal stain. Lab Processing Samples were processed in the lab after preservation following EPA Standard Operating Procedure for Benthic Invertebrate Laboratory Analysis SOP LG407 (U.S. EPA, 2015). Briefly, organisms were picked out of samples using a low-magnification dissecting microscope then each organism was identified to the finest taxonomic resolution possible (usually species). Individuals of the same species, or size category, were blotted dry on cellulose filter paper to remove external water until the wet spots left by animal(s) on the absorbent paper disappeared. Blotting time varied based on the surface area/volume ratio of the organisms but was approximately one minute for large and medium chironomids and oligochaetes and less time (0.6 min) for smaller chironomids and oligochaetes. Care was taken to ensure that the procedure did not cause damage to the specimens. Larger organisms (e.g., dreissenids) often took longer to blot dry. All organisms in a sample within a given taxonomic unit were weighed together to the nearest 0.0001 g (WW). Dreissena were weighed by 5 mm size category (size fractions: 0-4.99 mm, 5-9.99 mm, etc.) to nearest 0.0001 g (shell and tissue WW). Data Analysis To calculate the total weight for each species that was mounted on slides by size groups for identification (e.g., Oligochaeta, Chironomidae), we multiplied the number of individuals of the species binned into each size category by the average weight of individuals in that category. If a species was found in more than one size category, we summed the weight of the species across all categories per sample. Oligochaetes often fragment in samples, and thus, were counted by tallying the number of oligochaete heads (anterior ends with prostomium) present in the sample. Oligochaete fragments were also counted and weighed for inclusion in biomass calculations. We set the cutoff for the minimum number of samples to calculate individual weights to ten samples (see companion data paper for details). Therefore, in our further analysis we only calculated individual weights when a taxonomic unit was found in at least ten samples. Species that were found in fewer than ten samples were excluded from the analysis. We calculated wet weights by species whenever possible. If species were closely related, had similar body size (based on our previous experience), and were found in few samples, they were grouped together to achieve our minimum sample size of ten. For some taxa (e.g., Chironomidae), individual species could not be identified so calculations were made at the finest taxonomic resolution possible (usually genus). We hereafter refer to the two taxonomic groupings of closely related species and taxa that could not be identified to species as “taxonomic units.” For each taxonomic unit, we calculated several summary statistics on wet weight: mean, minimum, and maximum weight, median weight, standard error of mean weight, and sample size (number of samples in which a taxonomic unit was present). We performed Kruskal-Wallis tests (Kruskal & Wallis, 1952) to determine when individuals within a species could be grouped by depth zone and/or lake when sample size was large enough (species found in ≥10 samples per group) to permit splitting because we expected species weight to differ by depth zone and/or lake. In all five Great Lakes, benthic density and species richness are greater at stations ≤70 m than at stations deeper than 70 m (Burlakova et al., 2018; Cook & Johnson, 1974). The 70 m depth contour separation of benthos mirrors a breakpoint in spring chlorophyll concentrations observed for these stations, suggesting that lake productivity is likely the major driver of benthic abundance and diversity across lakes (Burlakova et al., 2018). Therefore, we used two categories of depth zones: ≤70 m and > 70 m. If Kruskal-Wallis tests showed that weights did not differ by lake or depth, the average weight for a species was calculated as an average of all lakes and depths. If Kruskal-Wallis tests showed significant separation (α < 0.05) by lake or depth, then means were calculated for each group and we also compared the group means. Individuals in different lakes or depth zones were combined if the mean difference between most groups was less than 25%, even when Kruskal-Wallis tests were significant because small differences were likely not biologically significant. Oligochaete fragments for finer taxonomic units were reported separately from oligochaete species because it was rarely apparent which species the fragments came from. Mean individual wet weights were calculated for a total of 187 groupings within taxonomic units (data file “IndividualWeights_AllData.csv”). For 117 taxonomic units, weights were calculated across all lakes, depths, and basins because weights were similar in all regions or because of small sample size, for seven taxonomic units, weights were calculated by lake, and for the rest summary statistics were calculated by both lake and depth zone. In addition, five species were considered as “special cases” where some areas were similar while others were not. For example, some species had similar weights in multiple lakes, thus those lakes were grouped together while other were kept separate. Dreissena rostriformis bugensis weights were calculated by lake and depth zone except for Lake Erie, where the western, central, and eastern basins were separated because previous research demonstrated that D. rostriformis bugensis size structure is drastically different in each of Lake Erie’s basins (Karatayev et al., 2021). Other special cases were: Heterotrissocladius marcidus group (Huron, Michigan, and Ontario were similar and grouped together, while mean weight in Lake Superior was different), Pisidium spp. (grouped as Ontario/Michigan, Erie, and Huron/Superior), Unidentified Chironomidae (Lake Erie was separated and all other lakes were grouped together), and Spirosperma ferox (Lake Erie was separated and all other lakes were grouped together). To calculate mean individual weights for commonly reported larger taxonomic groups (e.g., Oligochaeta, Chironomidae), we combined species or taxonomic units that belonged to this group (see “SpeciesList.csv” for information on groupings). Summary statistics were calculated on the mean individual weight for all individuals within a group in a given sample, i.e., total biomass for a given group was divided by total density for that group, repeated for each sample. Results are given for each major group as a mean/minimum/maximum for each lake, and for each depth zone within each lake as groups are often made up of different species with different body sizes in each lake and depth zone. Because densities of oligochaetes were counted based on the number of oligochaetes with heads in a sample (excluding fragments), but the fragments were weighed to calculate biomass, the mean individual weight for oligochaetes within a sample was calculated by dividing the weight of all oligochaetes (including fragments) in a sample by the number of oligochaetes (not including fragments). Calculations of mean individual weight by major group were performed both by lake and lake plus depth zone (data file “IndividualWeights_MajorGroups.csv”). Summary statistics were reported for 14 major taxa and were broken down by depth zone when sample size was sufficient (data file “IndividualWeights_MajorGroups.csv”). REFERENCES Burlakova, L. E., Barbiero, R. P., Karatayev, A. Y., Daniel, S. E., Hinchey, E. K., & Warren, G. J. (2018). The benthic community of the Laurentian Great Lakes: Analysis of spatial gradients and temporal trends from 1998 to 2014. Journal of Great Lakes Research, 44(4), 600–617. https://doi.org/10.1016/j.jglr.2018.04.008 Cook, D. G., & Johnson, M. G. (1974). Benthic Macroinvertebrates of the St. Lawrence Great Lakes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background and PurposeAbout 20.1% of intracranial aneurysms (IAs) carriers are multiple intracranial aneurysms (MIAs) patients with higher rupture risk and worse prognosis. A prediction model may bring some potential benefits. This study attempted to develop and externally validate a dynamic nomogram to assess the rupture risk of each IA among patients with MIA.MethodWe retrospectively analyzed the data of 262 patients with 611 IAs admitted to the Hunan Provincial People's Hospital between November 2015 and November 2021. Multivariable logistic regression (MLR) was applied to select the risk factors and derive a nomogram model for the assessment of IA rupture risk in MIA patients. To externally validate the nomogram, data of 35 patients with 78 IAs were collected from another independent center between December 2009 and May 2021. The performance of the nomogram was assessed in terms of discrimination, calibration, and clinical utility.ResultSize, location, irregular shape, diabetes history, and neck width were independently associated with IA rupture. The nomogram showed a good discriminative ability for ruptured and unruptured IAs in the derivation cohort (AUC = 0.81; 95% CI, 0.774–0.847) and was successfully generalized in the external validation cohort (AUC = 0.744; 95% CI, 0.627–0.862). The nomogram was calibrated well, and the decision curve analysis showed that it would generate more net benefit in identifying IA rupture than the “treat all” or “treat none” strategies at the threshold probabilities ranging from 10 to 60% both in the derivation and external validation set. The web-based dynamic nomogram calculator was accessible on https://wfs666.shinyapps.io/onlinecalculator/.ConclusionExternal validation has shown that the model was the potential to assist clinical identification of dangerous aneurysms after longitudinal data evaluation. Size, neck width, and location are the primary risk factors for ruptured IAs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"We believe that by accounting for the inherent uncertainty in the system during each measurement, the relationship between cause and effect can be assessed more accurately, potentially reducing the duration of research."
Short description
This dataset was created as part of a research project investigating the efficiency and learning mechanisms of a Bayesian adaptive search algorithm supported by the Imprecision Entropy Indicator (IEI) as a novel method. It includes detailed statistical results, posterior probability values, and the weighted averages of IEI across multiple simulations aimed at target localization within a defined spatial environment. Control experiments, including random search, random walk, and genetic algorithm-based approaches, were also performed to benchmark the system's performance and validate its reliability.
The task involved locating a target area centered at (100; 100) within a radius of 10 units (Research_area.png), inside a circular search space with a radius of 100 units. The search process continued until 1,000 successful target hits were achieved.
To benchmark the algorithm's performance and validate its reliability, control experiments were conducted using alternative search strategies, including random search, random walk, and genetic algorithm-based approaches. These control datasets serve as baselines, enabling comprehensive comparisons of efficiency, randomness, and convergence behavior across search methods, thereby demonstrating the effectiveness of our novel approach.
Uploaded files
The first dataset contains the average IEI values, generated by randomly simulating 300 x 1 hits for 10 bins per quadrant (4 quadrants in total) using the Python programming language, and calculating the corresponding IEI values. This resulted in a total of 4 x 10 x 300 x 1 = 12,000 data points. The summary of the IEI values by quadrant and bin is provided in the file results_1_300.csv. The calculation of IEI values for averages is based on likelihood, using an absolute difference-based approach for the likelihood probability computation. IEI_Likelihood_Based_Data.zip
The weighted IEI average values for likelihood calculation (Bayes formula) are provided in the file Weighted_IEI_Average_08_01_2025.xlsx
This dataset contains the results of a simulated target search experiment using Bayesian posterior updates and Imprecision Entropy Indicators (IEI). Each row represents a hit during the search process, including metrics such as Shannon entropy (H), Gini index (G), average distance, angular deviation, and calculated IEI values. The dataset also includes bin-specific posterior probability updates and likelihood calculations for each iteration. The simulation explores adaptive learning and posterior penalization strategies to optimize the search efficiency. Our Bayesian adaptive searching system source code (search algorithm, 1000 target searches): IEI_Self_Learning_08_01_2025.pyThis dataset contains the results of 1,000 iterations of a successful target search simulation. The simulation runs until the target is successfully located for each iteration. The dataset includes further three main outputs: a) Results files (results{iteration_number}.csv): Details of each hit during the search process, including entropy measures, Gini index, average distance and angle, Imprecision Entropy Indicators (IEI), coordinates, and the bin number of the hit. b) Posterior updates (Pbin_all_steps_{iter_number}.csv): Tracks the posterior probability updates for all bins during the search process acrosations multiple steps. c) Likelihoodanalysis(likelihood_analysis_{iteration_number}.csv): Contains the calculated likelihood values for each bin at every step, based on the difference between the measured IEI and pre-defined IE bin averages. IEI_Self_Learning_08_01_2025.py
Based on the mentioned Python source code (see point 3, Bayesian adaptive searching method with IEI values), we performed 1,000 successful target searches, and the outputs were saved in the:Self_learning_model_test_output.zip file.
Bayesian Search (IEI) from different quadrant. This dataset contains the results of Bayesian adaptive target search simulations, including various outputs that represent the performance and analysis of the search algorithm. The dataset includes: a) Heatmaps (Heatmap_I_Quadrant, Heatmap_II_Quadrant, Heatmap_III_Quadrant, Heatmap_IV_Quadrant): These heatmaps represent the search results and the paths taken from each quadrant during the simulations. They indicate how frequently the system selected each bin during the search process. b) Posterior Distributions (All_posteriors, Probability_distribution_posteriors_values, CDF_posteriors_values): Generated based on posterior values, these files track the posterior probability updates, including cumulative distribution functions (CDF) and probability distributions. c) Macro Summary (summary_csv_macro): This file aggregates metrics and key statistics from the simulation. It summarizes the results from the individual results.csv files. d) Heatmap Searching Method Documentation (Bayesian_Heatmap_Searching_Method_05_12_2024): This document visualizes the search algorithm's path, showing how frequently each bin was selected during the 1,000 successful target searches. e) One-Way ANOVA Analysis (Anova_analyze_dataset, One_way_Anova_analysis_results): This includes the database and SPSS calculations used to examine whether the starting quadrant influences the number of search steps required. The analysis was conducted at a 5% significance level, followed by a Games-Howell post hoc test [43] to identify which target-surrounding quadrants differed significantly in terms of the number of search steps. Results were saved in the Self_learning_model_test_results.zip
This dataset contains randomly generated sequences of bin selections (1-40) from a control search algorithm (random search) used to benchmark the performance of Bayesian-based methods. The process iteratively generates random numbers until a stopping condition is met (reaching target bins 1, 11, 21, or 31). This dataset serves as a baseline for analyzing the efficiency, randomness, and convergence of non-adaptive search strategies. The dataset includes the following: a) The Python source code of the random search algorithm. b) A file (summary_random_search.csv) containing the results of 1000 successful target hits. c) A heatmap visualizing the frequency of search steps for each bin, providing insight into the distribution of steps across the bins. Random_search.zip
This dataset contains the results of a random walk search algorithm, designed as a control mechanism to benchmark adaptive search strategies (Bayesian-based methods). The random walk operates within a defined space of 40 bins, where each bin has a set of neighboring bins. The search begins from a randomly chosen starting bin and proceeds iteratively, moving to a randomly selected neighboring bin, until one of the stopping conditions is met (bins 1, 11, 21, or 31). The dataset provides detailed records of 1,000 random walk iterations, with the following key components: a) Individual Iteration Results: Each iteration's search path is saved in a separate CSV file (random_walk_results_.csv), listing the sequence of steps taken and the corresponding bin at each step. b) Summary File: A combined summary of all iterations is available in random_walk_results_summary.csv, which aggregates the step-by-step data for all 1,000 random walks. c) Heatmap Visualization: A heatmap file is included to illustrate the frequency distribution of steps across bins, highlighting the relative visit frequencies of each bin during the random walks. d) Python Source Code: The Python script used to generate the random walk dataset is provided, allowing reproducibility and customization for further experiments. Random_walk.zip
This dataset contains the results of a genetic search algorithm implemented as a control method to benchmark adaptive Bayesian-based search strategies. The algorithm operates in a 40-bin search space with predefined target bins (1, 11, 21, 31) and evolves solutions through random initialization, selection, crossover, and mutation over 1000 successful runs. Dataset Components: a) Run Results: Individual run data is stored in separate files (genetic_algorithm_run_.csv), detailing: Generation: The generation number. Fitness: The fitness score of the solution. Steps: The path length in bins. Solution: The sequence of bins visited. b) Summary File: summary.csv consolidates the best solutions from all runs, including their fitness scores, path lengths, and sequences. c) All Steps File: summary_all_steps.csv records all bins visited during the runs for distribution analysis. d) A heatmap was also generated for the genetic search algorithm, illustrating the frequency of bins chosen during the search process as a representation of the search pathways.Genetic_search_algorithm.zip
Technical Information
The dataset files have been compressed into a standard ZIP archive using Total Commander (version 9.50). The ZIP format ensures compatibility across various operating systems and tools.
The XLSX files were created using Microsoft Excel Standard 2019 (Version 1808, Build 10416.20027)
The Python program was developed using Visual Studio Code (Version 1.96.2, user setup), with the following environment details: Commit fabd6a6b30b49f79a7aba0f2ad9df9b399473380f, built on 2024-12-19. The Electron version is 32.6, and the runtime environment includes Chromium 128.0.6263.186, Node.js 20.18.1, and V8 12.8.374.38-electron.0. The operating system is Windows NT x64 10.0.19045.
The statistical analysis included in this dataset was partially conducted using IBM SPSS Statistics, Version 29.0.1.0
The CSV files in this dataset were created following European standards, using a semicolon (;) as the delimiter instead of a comma, encoded in UTF-8 to ensure compatibility with a wide
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following dataset, comprised of SEM analysis images in .tif format, illustrates the findings of two key experimental categories related to the publication. The first dataset focuses on determining the optimal temperature for removing existing sizing from Carbon Fibres. Investigation was carried out across temperatures ranging from 300 to 600 degrees Celsius, with a 50-degree Celsius increment. The second dataset explores the morphology of the newly introduced sizing, incorporating FLG and CNT nanoparticles at various concentrations. These images, available in .tif format, offer a representative illustration of structural and surface morphologies in the materials studied, correlating with the examined parameters. The next data are in .txt formal and related with mechanical test that have been performed on fibres. Tensile tests and 3 point bending tests have been performed in resin impregnated and consolidated fibres. The tensile strength and bend strength have been calculated according to ASTMD4018 and ASTMD790 accordingly. Further analysis of the data have been performed in .xls file to calculate the average and standard deviation of each category. The categories were the fibres sized with different nanomaterials (FLG and CNTs) and in different ratios. Txt files have also been used for the analysis from nanoindentation and push out tests. Same principle was applied also here, since average and standard variation have been calculated in .xls file. CSV file was extracted from a TGA machine. This analysis was performed to estimate the amount of sizing on reference and desized fibres. The software can also extract the results in image format .jpeg. Lastly, image files in .png format have been extracted from optical microscope to calculate the angles formed from resin droplets on treated fibres.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This is a condensed version of the raw data obtained through the Google Data Analytics Course, made available by Lyft and the City of Chicago under this license (https://ride.divvybikes.com/data-license-agreement).
I originally did my study in another platform, and the original files were too large to upload to Posit Cloud in full. Each of the 12 monthly files contained anywhere from 100k to 800k rows. Therefore, I decided to reduce the number of rows drastically by performing grouping, summaries, and thoughtful omissions in Excel for each csv file. What I have uploaded here is the result of that process.
Data is grouped by: month, day, rider_type, bike_type, and time_of_day. total_rides represent the sum of the data in each grouping as well as the total number of rows that were combined to make the new summarized row, avg_ride_length is the calculated average of all data in each grouping.
Be sure that you use weighted averages if you want to calculate the mean of avg_ride_length for different subgroups as the values in this file are already averages of the summarized groups. You can include the total_rides value in your weighted average calculation to weigh properly.
date - year, month, and day in date format - includes all days in 2022 day_of_week - Actual day of week as character. Set up a new sort order if needed. rider_type - values are either 'casual', those who pay per ride, or 'member', for riders who have annual memberships. bike_type - Values are 'classic' (non-electric, traditional bikes), or 'electric' (e-bikes). time_of_day - this divides the day into 6 equal time frames, 4 hours each, starting at 12AM. Each individual ride was placed into one of these time frames using the time they STARTED their rides, even if the ride was long enough to end in a later time frame. This column was added to help summarize the original dataset. total_rides - Count of all individual rides in each grouping (row). This column was added to help summarize the original dataset. avg_ride_length - The calculated average of all rides in each grouping (row). Look to total_rides to know how many original rides length values were included in this average. This column was added to help summarize the original dataset. min_ride_length - Minimum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset. max_ride_length - Maximum ride length of all rides in each grouping (row). This column was added to help summarize the original dataset.
Please note: the time_of_day column has inconsistent spacing. Use mutate(time_of_day = gsub(" ", "", time_of _day)) to remove all spaces.
Below is the list of revisions I made in Excel before uploading the final csv files to the R environment:
Deleted station location columns and lat/long as much of this data was already missing.
Deleted ride id column since each observation was unique and I would not be joining with another table on this variable.
Deleted rows pertaining to "docked bikes" since there were no member entries for this type and I could not compare member vs casual rider data. I also received no information in the project details about what constitutes a "docked" bike.
Used ride start time and end time to calculate a new column called ride_length (by subtracting), and deleted all rows with 0 and 1 minute results, which were explained in the project outline as being related to staff tasks rather than users. An example would be taking a bike out of rotation for maintenance.
Placed start time into a range of times (time_of_day) in order to group more observations while maintaining general time data. time_of_day now represents a time frame when the bike ride BEGAN. I created six 4-hour time frames, beginning at 12AM.
Added a Day of Week column, with Sunday = 1 and Saturday = 7, then changed from numbers to the actual day names.
Used pivot tables to group total_rides, avg_ride_length, min_ride_length, and max_ride_length by date, rider_type, bike_type, and time_of_day.
Combined into one csv file with all months, containing less than 9,000 rows (instead of several million)