The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About Dataset Safa S. Abdul-Jabbar, Alaa k. Farhan
Context This is the first Dataset for various ordinary patients in Iraq. The Dataset provides the patients’ Cell Blood Count test information that can be used to create a Hematology diagnosis/prediction system. Also, this Data was collected in 2022 from Al-Zahraa Al-Ahly Hospital. These data can be cleaned & analyzed using any programming language because it is provided in an excel file that can be accessed and manipulated easily. The user just needs to understand how rows and columns are arranged because the data was collected as images(CBC images) from the laboratories and then stored the extracted data in an excel file. Content This Dataset contains 500 rows. For each row (patient information), there are 21 columns containing CBC test features that can be described as follows:
ID: Patients Identifier
WBC: White Blood Cell, Normal Ranges: 4.0 to 10.0, Unit: 10^9/L.
LYMp: Lymphocytes percentage, which is a type of white blood cell, Normal Ranges: 20.0 to 40.0, Unit: %
MIDp: Indicates the percentage combined value of the other types of white blood cells not classified as lymphocytes or granulocytes, Normal Ranges: 1.0 to 15.0, Unit: %
NEUTp: Neutrophils are a type of white blood cell (leukocytes); neutrophils percentage, Normal Ranges: 50.0 to 70.0, Unit: %
LYMn: Lymphocytes number are a type of white blood cell, Normal Ranges: 0.6 to 4.1, Unit: 10^9/L.
MIDn: Indicates the combined number of other white blood cells not classified as lymphocytes or granulocytes, Normal Ranges: 0.1 to 1.8, Unit: 10^9/L.
NEUTn: Neutrophils Number, Normal Ranges: 2.0 to 7.8, Unit: 10^9/L.
RBC: Red Blood Cell, Normal Ranges: 3.50 to 5.50, Unit: 10^12/L
HGB: Hemoglobin, Normal Ranges: 11.0 to 16.0, Unit: g/dL
HCT: Hematocrit is the proportion, by volume, of the Blood that consists of red blood cells, Normal Ranges: 36.0 to 48.0, Unit: %
MCV: Mean Corpuscular Volume, Normal Ranges: 80.0 to 99.0, Unit: fL
MCH: Mean Corpuscular Hemoglobin is the average amount of haemoglobin in the average red cell, Normal Ranges: 26.0 to 32.0, Unit: pg
MCHC: Mean Corpuscular Hemoglobin Concentration, Normal Ranges: 32.0 to 36.0, Unit: g/dL
RDWSD: Red Blood Cell Distribution Width, Normal Ranges: 37.0 to 54.0, Unit: fL
RDWCV: Red blood cell distribution width, Normal Ranges: 11.5 to 14.5, Unit: %
PLT: Platelet Count, Normal Ranges: 100 to 400, Unit: 10^9/L
MPV: Mean Platelet Volume, Normal Ranges: 7.4 to 10.4, Unit: fL
PDW: Red Cell Distribution Width, Normal Ranges: 10.0 to 17.0, Unit: %
PCT: The level of Procalcitonin in the Blood, Normal Ranges: 0.10 to 0.28, Unit: %
PLCR: Platelet Large Cell Ratio, Normal Ranges: 13.0 to 43.0, Unit: %
Acknowledgements We thank the entire Al-Zahraa Al-Ahly Hospital Hospital team, especially the hospital manager, for cooperating with us in collecting this data while maintaining patients' confidentiality.
The Alaska Geochemical Database Version 2.0 (AGDB2) contains new geochemical data compilations in which each geologic material sample has one "best value" determination for each analyzed species, greatly improving speed and efficiency of use. Like the Alaska Geochemical Database (AGDB) before it, the AGDB2 was created and designed to compile and integrate geochemical data from Alaska in order to facilitate geologic mapping, petrologic studies, mineral resource assessments, definition of geochemical baseline values and statistics, environmental impact assessments, and studies in medical geology. This relational database, created from the Alaska Geochemical Database (AGDB) that was released in 2011, serves as a data archive in support of present and future Alaskan geologic and geochemical projects, and contains data tables in several different formats describing historical and new quantitative and qualitative geochemical analyses. The analytical results were determined by 85 laboratory and field analytical methods on 264,095 rock, sediment, soil, mineral and heavy-mineral concentrate samples. Most samples were collected by U.S. Geological Survey (USGS) personnel and analyzed in USGS laboratories or, under contracts, in commercial analytical laboratories. These data represent analyses of samples collected as part of various USGS programs and projects from 1962 through 2009. In addition, mineralogical data from 18,138 nonmagnetic heavy mineral concentrate samples are included in this database. The AGDB2 includes historical geochemical data originally archived in the USGS Rock Analysis Storage System (RASS) database, used from the mid-1960s through the late 1980s and the USGS PLUTO database used from the mid-1970s through the mid-1990s. All of these data are currently maintained in the National Geochemical Database (NGDB). Retrievals from the NGDB were used to generate most of the AGDB data set. These data were checked for accuracy regarding sample location, sample media type, and analytical methods used. This arduous process of reviewing, verifying and, where necessary, editing all USGS geochemical data resulted in a significantly improved Alaska geochemical dataset. USGS data that were not previously in the NGDB because the data predate the earliest USGS geochemical databases, or were once excluded for programmatic reasons, are included here in the AGDB2 and will be added to the NGDB. The AGDB2 data provided here are the most accurate and complete to date, and should be useful for a wide variety of geochemical studies. The AGDB2 data provided in the linked database may be updated or changed periodically.
This data set contains R code for a hypothetical exposure model described in the manuscript "A quantitative source-to-outcome case study to demonstrate the integration of human health and ecological endpoints using the Aggregate Exposure Pathway and Adverse Outcome Pathway frameworks". Additionally, this data set contains an Excel file that provides the range of parameters used in Monte Carlo simulations to generate iterations of the exposure network. This dataset is associated with the following publication: Hines, D., R. Conolly, and A. Jarabek. A quantitative source-to-outcome case study to demonstrate the integration of human health and ecological endpoints using the Aggregate Exposure Pathway and Adverse Outcome Pathway frameworks. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 53(8): 11002-11012, (2019).
Our goals with this dataset were to 1) isolate, culture, and identify two fungal life stages of Aspergillus flavus, 2) characterize the volatile emissions from grain inoculated by each fungal morphotype, and 3) understand how microbially-produced volatile organic compounds (MVOCs) from each fungal morphotype affect foraging, attraction, and preference by S. oryzae. This dataset includes that derived from headspace collection coupled with GC-MS, where we found the sexual life stage of A. flavus had the most unique emissions of MVOCs compared to the other semiochemical treatments. This translated to a higher arrestment with kernels containing grain with the A. flavus sexual life stage, as well as a higher cumulative time spent in those zones by S. oryzae in a video-tracking assay in comparison to the asexual life stage. While fungal cues were important for foraging at close-range, the release-recapture assay indicated that grain volatiles were more important for attraction at longer distances. There was no significant preference between grain and MVOCs in a four-way olfactometer, but methodological limitations in this assay prevent broad interpretation. Overall, this study enhances our understanding of how fungal cues affect the foraging ecology of a primary stored product insect. In the assays described herein, we analyzed the behavioral response of Sitophilus oryzae to five different blends of semiochemicals found and introduced in wheat (Table 1). Briefly, these included no stimuli (negative control), UV-sanitized grain, clean grain from storage (unmanipulated, positive control), as well as grain from storage inoculated with fungal morphotype 1 (M1, identified as the asexual life stage of Aspergillus flavus) and fungal morphotype 2 (M2, identified as the sexual life stage of A. flavus). Fresh samples of semiochemicals were used for each day of testing for each assay. In order to prevent cross-contamination, 300 g of grain (tempered to 15% grain moisture) was initially sanitized using UV for 20 min. This procedure was done before inoculating grain with either morphotype 1 or 2. The 300 g of grain was kept in a sanitized mason jar (8.5 D × 17 cm H). To inoculate grain with the two different morphologies, we scraped an entire isolation from a petri dish into the 300 g of grain. Each isolation was ~1 week old and completely colonized by the given morphotype. After inoculation, each treatment was placed in an environmental chamber (136VL, Percival Instruments, Perry, IA, USA) set at constant conditions (30°C, 65% RH, and 14:10 L:D). This procedure was the same for both morphologies and was done every 2 weeks to ensure fresh treatments for each experimental assay. See file list for descriptions of each data file. Resources in this dataset:Resource Title: Ethovision Movement Assay. File Name: ponce_lizarraga_ethovision_assay_microbial_volatiles_2020.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Olfactometer Round 1 Assay - With Fused Air Permeable Glass. File Name: ponce_lizarraga_first_round_olfactometer_fungal_study_2020.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Olfactometer Round 2 Assay - With Fused Air Permeable Glass Containing Holes. File Name: ponce_lizarraga_second_round_olfactometer_fungal_study_2021.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Small Release-Recapture Assay. File Name: ponce_lizarraga_small_release_recapture_assay.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Large Release-Recapture Assay. File Name: ponce_lizarraga_large_release_recapture_assay.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: Headspace Volatile Collection Assay. File Name: sandra_headspace_volatiles_2020.csvResource Software Recommended: Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel Resource Title: README file list. File Name: file_list_stored_grain_Aspergillus_Sitophilus_oryzae.txt
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was generated as part of the study aimed at profiling global scientific academies, which play a significant role in promoting scholarly communication and scientific progress. Below is a detailed description of the dataset:Data Generation Procedures and Tools: The dataset was compiled using a combination of web scraping, manual verification, and data integration from multiple sources, including Wikipedia categories,member of union of scientific organizations, and web searches using specific query phrases (e.g., "country name + (academy OR society) AND site:.country code"). The records were enriched by cross-referencing data from the Wikidata API, the VIAF API, and the Research Organisation Registry (ROR). Additional manual curation ensured accuracy and consistency.Temporal and Geographical Scopes: The dataset covers scientific academies from a wide temporal scope, ranging from the 15th century to the present. The geographical scope includes academies from all continents, with emphasis on both developed and post-developing countries. The dataset aims to capture the full spectrum of scientific academies across different periods of historical development.Tabular Data Description: The dataset comprises a total of 301 academy records and 14,008 website navigation sections. Each row in the dataset represents a single scientific academy, while the columns describe attributes such as the academy’s name, founding date, location (city and country), website URL, email, and address.Missing Data: Although the dataset offers comprehensive coverage, some entries may have missing or incomplete fields. For instance, section was not available for all records.Data Errors and Error Ranges: The data has been verified through manual curation, reducing the likelihood of errors. However, the use of crowd-sourced data from platforms like Wikipedia introduces potential risks of outdated or incomplete information. Any errors are likely minor and confined to fields such as navigation menu classifications, which may not fully reflect the breadth of an academy's activities.Data Files, Formats, and Sizes: The dataset is provided in CSV format and JSON format, ensuring compatibility with a wide range of software applications, including Microsoft Excel, Google Sheets, and programming languages such as Python (via libraries like pandas).This dataset provides a valuable resource for further research into the organizational behaviors, geographic distribution, and historical significance of scientific academies across the globe. It can be used for large-scale analyses, including comparative studies across different regions or time periods.Any feedback on the data is welcome! Please contact the maintaner of the dataset!If you use the data, please cite the following paper:Xiaoli Chen and Xuezhao Wang. 2024. Profiling Global Scientific Academies. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 16–20, 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702582
Data on the distribution of invasive cane toads in New South Wales was collated from all available sources, to quantify rates of expansion and to identify correlates of that rate of spread. We also conducted pilot studies to comapre alternative emthods of detecting invasion-front populations of toads in the field.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Information provided in an excel spreadsheet of the GPS coordinate locations of the Shooting Ranges in Nova Scotia.
Excel Age-Range creator for Office for National Statistics (ONS) Mid year population estimates (MYE) covering each year between 1999 and 2016
These files take into account the revised estimates for 2002-2010 released in April 2013 down to Local Authority level and the post 2011 estimates based on the Census results. Scotland and Northern Ireland data has not been revised, so Great Britain and United Kingdom totals comprise the original data for these plus revised England and Wales figures.
This Excel based tool enables users to query the single year of age raw data so that any age range can easily be calculated without having to carry out often complex, and time consuming formulas that could also be open to human error. Simply select the lower and upper age range for both males and females and the spreadsheet will return the total population for the range. Please adhere to the terms and conditions of supply contained within the file.
Tip: You can copy and paste the rows you are interested in to another worksheet by using the filters at the top of the columns and then select all by pressing Ctrl+A. Then simply copy and paste the cells to a new location.
ONS Mid year population estimates
Open Excel tool (London Boroughs, Regions and National, 1999-2016)
Also available is a custom-age tool for all geographies in the UK. Open the tool for all UK geographies (local authority and above) for: 2010, 2011, 2012, 2013, 2014 and 2015.
This full MYE dataset by single year of age (SYA) age and gender is available as a Datastore package here.
Ward Level Population estimates
Single year of age population tool for 2002 to 2015 for all wards in London.
New 2014 Ward boundary estimates
Ward boundary changes in May 2014 only affected three London boroughs - Hackney, Kensington and Chelsea, and Tower Hamlets. The estimates between 2001-2013 have been calculated by the GLA by taking the proportion of a the old ward that falls within the new ward based on the proportion of population living in each area at the 2011 Census. Therefore, these estimates are purely indicative and are not official statistics and not endorsed by ONS. From 2014 onwards, ONS began publishing official estimates for the new ward boundaries. Download here.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data belongs to a manuscript submitted to Data in Brief, in which the content and lay-out of this data is described in detail. Data overviews (including figures and tables for age and gender groups) can be found at OSF | Normative 3D gait data of healthy subjects walking at three different speeds on an instrumented treadmill in virtual reality.A normative gait dataset of 246 healthy adults (122 men / 124 women, range in age 18-91 years, body weight 46.80-116.10 kg, height 1.53-1.97 m and BMI 18.25-35.63 kg/m2) is presented and publicly shared for three walking speed conditions (comfortable, slow and fast speed).Three dimensional gait analysis was performed at the Computer Assisted Rehabilitation Environment (CAREN) at the Maastricht University Medical Centre (MUMC+). Subjects walked on the instrumented treadmill surrounded with twelve 3D cameras, three 2D cameras and a virtual environment projected on a 180° screen using the Human Body Lower Limb Model with trunk markers (HBM-II) as biomechanical model.Subjects walked at comfortable walking speed, 30% slower and 30% faster. These walking speed conditions were applied in a random sequence. Comfortable walking speed was determined using a RAMP protocol: subjects started to walk at 0.5 m/s and every second the speed was increased wit 0.01 m/s until comfortable speed was reached. The average of three repetitions was considered the comfortable speed. For each walking speed condition, 250 steps were recorded.The 3D gait data was collected using the D-flow CAREN software. Raw data were processed in Matlab (Mathworks 2016), including quality check, step determination and the exportation of data to xls. Processed data includes spatiotemporal parameters, medio-lateral (ML) and back-forward (BF) margins of stability (MoS), 3D joint angles, anterior-posterior (AP) and vertical GRFs, 3D joint moments and 3D joint power of both legs.The attached files include the processed data for each adult for walking at slow (comfortable -30%) speed containing spatiotemporal parameters, MoS, joint angles, GRF, joint moments, joint power including every valid step of both legs.The title of this file (27_individual excel files) corresponds to the associated manuscript (submitted to Data in Brief)
Identification of study species
We identified 40 marine species with documented shifts in range limits along the coastline (<15 km from shore) of North America, including plants, invertebrates, fish, a protist, and a bird. Of these, 26 species were compiled by Sorte et al. (2010), and we added 14 species from an updated literature review. We searched Google Scholar (on 08/20/2019) using this search string: marine "range expansion" species "range shift". We reviewed titles and, when appropriate, abstracts and text of the first 600 results, identifying 12 additional species from eight papers. We added two species (Brachidontes adamsianus and Mexacanthina lugubris) from our literature files and personal observations. We excluded migratory or pelagic species with large biogeographic ranges, for which it was difficult to confirm historical native ranges.
Review of published impacts
Evidence of species’ impacts was compiled from online database searches an...
This dataset includes an excel file containing all the data needed to carry out the analyses detailed in the manuscript and shown in full in Appendix S1 of the manuscript. The data is the raw input into the analyses conducted in the analysis. The first sheet in the excel file is a key and the other sheets contain data and are labeled according to the tables they relate to in Appendix S1 of the manuscript.
Fish community composition data from survey gill netting at lakes in England, Scotland and Wales. Exists as a number of discrete Excel files from a range of earlier projects. More information on this dataset can be found in the Freshwater Metadatabase - BF_W_14-L-NCA (http://www.freshwatermetadata.eu/metadb/bf_mdb_view.php?entryID=BF_W_14-L-NCA).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A range of quarterly Excel spreadsheets and SuperTABLE datacubes. The spreadsheets contain broad level data covering all the major items of the Labour Force Survey in time series format, including seasonally adjusted and trend estimates. The datacubes contain more detailed and cross classified original data than the spreadsheets.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
These are results of a series of laboratory experiments to determine if topical application of methoprene and 20-ecdysone can terminate reproductive diapause of the weevil, Ceratapion basicorne, which is a recently permitted biological control agent of yellow starthistle (Centaurea solstitialis). Adult weevils feed on leaves, creating pin holes, and lay eggs inside leaves. Diapausing weevils were treated with various doses of methoprene (0, 0.01, 0.1, 1.0 micrograms) dissolved in acetone in experiments 1 and 2. They were treated sequentially first with acetone or 20-ecdysone (1.0 microgram) and then with methoprene (1.0 microgram) in experiment 3 and were treated with 20-ecdysone followed by methoprene in experiment 4. Resources in this dataset:Resource Title: data dictionary. File Name: JH Data Dictionary.csvResource Description: description of data fieldsResource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/microsoft-365/excel Resource Title: experiment 1. File Name: JH expt1 data.csvResource Description: Methoprene dissolved in acetone was applied topically at doses of 0.0, 0.01 and 0.1 and 1.0 μg per female weevil, and the number of feeding holes and eggs were recorded daily on cut leaves of yellow starthistle at room temperature (12 h photoperiod, temperature range 17 to 21°C).Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/microsoft-365/excel Resource Title: experiment 2. File Name: JH expt2 data.csvResource Description: Methoprene dissolved in acetone was applied topically at doses of 0.0 and 1.0 μg to female weevils that did not produce eggs in experiment 1. The number of feeding holes and eggs were recorded daily on cut leaves of yellow starthistle at room temperature (12 h photoperiod, temperature range 17 to 21°C).Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/microsoft-365/excel Resource Title: experiment 3. File Name: JH expt3 data.csvResource Description: Three types of treatments were applied with sequential applications 2 days apart: 1) acetone + acetone [AA: control], 2) acetone + methoprene [AM], and 20-ecdysone + methoprene 174 [2M]. All doses were 1.0 μg. The number of feeding holes and eggs were recorded every 2 days on cut leaves of yellow starthistle at room temperature (12 h photoperiod, temperature range 17 to 21°C).Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/microsoft-365/excel Resource Title: experiment 4. File Name: JH expt4 data.csvResource Description: Females from experiment 3 that did not oviposit consistently were treated with 1.0 μg of 20-ecdysone followed 2 days later by 1.0 μg of methoprene. The treatments AA, AM, 2M refer to experiment 3. The number of feeding holes and eggs were recorded every 2 days on cut leaves of yellow starthistle at room temperature (12 h photoperiod, temperature range 17 to 21°C).Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/microsoft-365/excel
The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.
This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.
https://i.imgur.com/6UEqejq.png" alt="">
This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.
Cover Photo by: Freepik
Thumbnail by: Clothing icons created by Flat Icons - Flaticon
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is derived from the BIM model of the Southern Queensland Correctional Precinct, a large-scale construction project in Southern Queensland. The dataset, represented in an Excel spreadsheet, is comprehensive, encompassing a wide range of attributes related to the distribution boards (DBs) used in the construction. Key attributes include physical dimensions, physical location, electrical parameters, and notably, OmniClass 21 and 23 classifications. These classifications are critical as they provide a structured means for classifying and categorizing the components of the built environment and are particularly useful in analyzing and managing data. The dataset is expansive, comprising of numerous entries detailing each distribution board used in the precinct's construction. The OmniClass classification scheme, in particular, is of high significance in this dataset as it is used to investigate the barriers between Building Information Modeling (BIM) and Open Indusry Interoperability Ecosystem (OIIE) in a research project context. The ultimate purpose of this dataset is to provide a detailed and organized source of data for research into the integration of BIM and OIIE, with a focus on identifying and overcoming the barriers that exist between these two systems in the current construction and building management environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset of crystallographic texture results for both α (hexagonal close packed, hcp) and β (body-centred cubic, bcc) phases, measured from 31 different hot-rolled Ti-6Al-4V (Ti-64) materials and 3 differently orientated samples using synchrotron X-ray diffraction (SXRD). The aim of the work was to accurately quantify bulk macro-texture for both the α and β phases across a range of different processing conditions, and to compare results with electron backscatter diffraction (EBSD) measurements. The synchrotron intensities were extracted using a new Fourier-based peak fitting method from the Continuous-Peak-Fit Python package, and then directly used to calculate the pole figures, orientation distribution functions (ODFs) and numerical values for the texture indices in MTEX
Material
The Ti-64 materials had been hot-rolled at a range of different temperatures, and to different reductions, followed by air-cooling. Three samples of different orientation were cut from the centre of these rolled blocks, and from the starting material. The material and hot-rolling conditions are recorded in this analysis dataset as an excel spreadsheet and summarised in the table below.
A table recording the sample number and associated hot-rolling condition.
Sample Number
Rolling Condition
1
825ºC, 87.5% Reduction
2
865ºC, 87.5% Reduction
3
895ºC, 87.5% Reduction
4
915ºC, 87.5% Reduction
5
935ºC, 87.5% Reduction
6
950ºC, 87.5% Reduction
7
960ºC, 87.5% Reduction
8
975ºC, 87.5% Reduction
9
1020ºC, 87.5% Reduction
10
β-annealed, 825ºC, 87.5% Reduction
11
β-annealed, 915ºC, 87.5% Reduction
12
β-annealed, 975ºC, 87.5% Reduction
13
Reduced heating from 915ºC, 87.5% Reduction
14
Reduced heating from 975ºC, 87.5% Reduction
15
825ºC, 75% Reduction
16
865ºC, 75% Reduction
17
895ºC, 75% Reduction
18
915ºC, 75% Reduction
19
935ºC, 75% Reduction
20
950ºC, 75% Reduction
21
960ºC, 75% Reduction
22
975ºC, 75% Reduction
23
1020ºC, 75% Reduction
24
β-annealed, 825ºC, 75% Reduction
25
β-annealed, 915ºC, 75% Reduction
26
β-annealed, 975ºC, 75% Reduction
27
Reduced heating from 915ºC, 75% Reduction
28
Reduced heating from 975ºC, 75% Reduction
29
As-received
30
As-received, β-annealed
31
975ºC, 50% Reduction
MTEX Data Analysis
The lattice plane intensities for 22 α and 4 β phase peaks were extracted from the Continuous-Peak-Fit analysis, also included in this analysis dataset, and saved as text files in the form of pole figures. The lattice intensity text files were analysed in MTEX using scripts from the continuous-peak-fit-analysis package, to plot pole figures and ODF slices, and to calculate pole figure maxima, ODF maxima, texture indices and texture component phase fractions. A kernel half-width of 10° was found to produce optimal data fitting, for highly accurate texture strength intensity values.
Metadata
An accompanying YAML text file contains associated processing metadata for the SXRD analysis, recording information about the packages used to process the data, along with details about the different files contained within this results dataset.
Identifiers of many kinds are the key to creating unambiguous and persistent connections between research objects and other items in the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but many existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers for papers (DOIs) connected..., These data are Dryad metadata retrieved from https://datadryad.org and translated into csv files. There are two datasets: Â 1. DryadJournalDataset was retrieved from Dryad using the ISSNs in the file DryadJournalDataset_ISSNs.txt, although some had no data. Â 2. DryadOrganizationDataset was retrieved from Dryad using the RORs in the file DryadOrganizationDataset_RORs.txt, although some had no data. Each dataset includes four types of metadata: identifiers, funders, keywords, and related works, each in a separate comma (.csv) or tab (.tsv) delimited files. There are also Microsoft Excel files (.xlsx) for the identifier metadata and connectivity summaries for each dataset (*.html). The connectivity summaries include summaries of each parameter in all four data files with definitions, counts, unique counts, most frequent values, and completeness. These data formed the basis for an analysis of the connectivity of the Dryad repository for organizations, funders, and people., , # Data For: Sustainable Connectivity in a Community Repository
This readme.txt file was generated on 30231110 by Ted Habermann
Data For: Sustainable Connectivity in a Community Repository
Principal Investigator Contact Information Name: Ted Habermann (0000-0003-3585-6733) Institution: Metadata Game Changers () Email: ORCID: 0000-0003-3585-6733
November 10, 2023
May and June 2023
National Science Foundation (Crossref Funder ID: 100000001) Award 2134956.
These data are Dryad metadata retrieved from and translated into csv files. There are two datasets:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel file containing the 2016 Varroa validation data, full-season temperature data, along with colony covariate information (Varroa validation dataset).
The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel