This dataset was created by Hussein Al Chami
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our aim is to make Yarra a clean and pleasant place for our residents to live. This data asset has information about sweeping and loose litter removal across residential roads, kerbs and public open spaces within the Yarra municipality. The street cleansing details include cleaning date and time, suburb where the cleaning was done, category of cleaning, volume of litter removed and cleaning duration.\r \r While all due care has been taken to ensure the data asset is accurate and current, Yarra City Council does not warrant that this data is definitive nor free of error and does not accept responsibility for any loss, damage, claim, expense, cost or liability whatsoever arising from reliance upon information provided herein.\r \r Feedback on the data asset - including compliments, complaints and requests for more detail - is welcome.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Sample data for exercises in Further Adventures in Data Cleaning.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Four species of planktic foraminifera from core-tops spanning a depth transect on the Ontong Java Plateau were prepared for Mg/Ca analysis both with (Cd-cleaning) and without (Mg-cleaning) a reductive cleaning step. Reductive cleaning caused etching of foraminiferal calcite, focused on Mg-rich inner calcite, even on tests which had already been partially dissolved at the seafloor. Despite corrosion, there was no difference in Mg/Ca of Pulleniatina obliquiloculata between cleaning methods. Reductive cleaning decreased Mg/Ca by an average (all depths) of ~ 4% for Globigerinoides ruber white and ~ 10% for Neogloboquadrina dutertrei. Mg/Ca of Globigerinoides sacculifer (above the calcite saturation horizon only) was 5% lower after reductive cleaning. The decrease in Mg/Ca due to reductive cleaning appeared insensitive to preservation state for G. ruber, N. dutertrei and P. obliquiloculata. Mg/Ca of Cd-cleaned G. sacculifer appeared less sensitive to dissolution than that of Mg-cleaned. Mg-cleaning is adequate, but SEM and contaminants (Al/Ca, Fe/Ca and Mn/Ca) show that Cd-cleaning is more effective for porous species. A second aspect of the study addressed sample loss during cleaning. Lower yield after Cd-cleaning for G. ruber, G. sacculifer and N. dutertrei confirmed this to be the more aggressive method. Strongest correlations between yield and Delta[CO3^2-] in core-top samples were for Cd-cleaned G. ruber (r = 0.88, p = 0.020) and Cd-cleaned P. obliquiloculata (r = 0.68, p = 0.030). In a down-core record (WIND28K) correlation, r, between yield values > 30% and dissolution index, XDX, was -0.61 (p = 0.002). Where cleaning yield < 30% most Mg-cleaned Mg/Ca values were biased by dissolution.
https://www.valuemarketresearch.com/privacy-policyhttps://www.valuemarketresearch.com/privacy-policy
The global demand for Ultrasonic Cleaning Equipment Market is presumed to reach the market size of nearly USD 970.83 Million by 2032 from USD 680.91 Million in 2023 with a CAGR of 4.02% under the study period 2024-2035.
Ultrasonic cleaning is the process of removing contamination from surfaces, nooks and crannies by passing sound waves through the water to create microscopic implosions. The implo
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global stainless steel tank foamers market is experiencing robust growth, driven by increasing demand across diverse sectors. The market's expansion is fueled by several key factors. Firstly, the rising adoption of foam cleaning technologies in industrial settings, particularly factory and kitchen cleaning, offers significant opportunities. Stainless steel's inherent properties – durability, corrosion resistance, and ease of sanitation – make it the preferred material for these applications. The market is segmented by capacity (24L, 50L, 100L, and others), reflecting the varied needs of different users. Furthermore, growing environmental concerns are prompting businesses to adopt eco-friendly cleaning solutions compatible with stainless steel tank foamers, contributing to market expansion. Finally, technological advancements leading to more efficient and user-friendly foamer designs are also driving growth. We estimate the market size to be approximately $500 million in 2025, based on industry analyses of related equipment markets and considering the factors mentioned above. A conservative CAGR of 7% is projected for the forecast period (2025-2033), resulting in significant market expansion by 2033. However, factors such as high initial investment costs for the equipment and potential competition from alternative cleaning technologies could pose challenges to continued growth. Geographic distribution showcases a diverse landscape. North America and Europe currently dominate the market, driven by strong industrial sectors and early adoption of advanced cleaning technologies. However, the Asia-Pacific region is anticipated to experience rapid growth in the coming years due to increasing industrialization and rising disposable incomes. This region will likely witness significant market penetration due to the rising focus on hygiene and sanitation in various sectors. The competitive landscape includes both established manufacturers and new entrants, indicating a dynamic market with ongoing innovation and competition. Key players are focusing on product diversification, technological advancements, and strategic partnerships to maintain market share and capitalize on growth opportunities. This strategic approach is expected to solidify the position of these players in the coming years.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets presented here were partially used in “Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings” (Toscano, A., Ferreira, D. , Morabito, R. , Computers & Chemical Engineering) [1], in “A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning” (Toscano, A., Ferreira, D. , Morabito, R. , Flexible Services and Manufacturing Journal) [2], and in “A heuristic approach to optimize the production scheduling of fruit-based beverages” (Toscano et al., Gestão & Produção, 2020) [3]. In fruit-based production processes, there are two production stages: preparation tanks and production lines. This production process has some process-specific characteristics, such as temporal cleanings and synchrony between the two production stages, which make optimized production planning and scheduling even more difficult. In this sense, some papers in the literature have proposed different methods to solve this problem. To the best of our knowledge, there are no standard datasets used by researchers in the literature in order to verify the accuracy and performance of proposed methods or to be a benchmark for other researchers considering this problem. The authors have been using small data sets that do not satisfactorily represent different scenarios of production. Since the demand in the beverage sector is seasonal, a wide range of scenarios enables us to evaluate the effectiveness of the proposed methods in the scientific literature in solving real scenarios of the problem. The datasets presented here include data based on real data collected from five beverage companies. We presented four datasets that are specifically constructed assuming a scenario of restricted capacity and balanced costs. These dataset is supplementary data for the submitted paper to Data in Brief [4]. [1] Toscano, A., Ferreira, D., Morabito, R., Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings, Computers & Chemical Engineering. 142 (2020) 107038. Doi: 10.1016/j.compchemeng.2020.107038. [2] Toscano, A., Ferreira, D., Morabito, R., A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning, Flexible Services and Manufacturing Journal. 31 (2019) 142-173. Doi: 10.1007/s10696-017-9303-9. [3] Toscano, A., Ferreira, D., Morabito, R., Trassi, M. V. C., A heuristic approach to optimize the production scheduling of fruit-based beverages. Gestão & Produção, 27(4), e4869, 2020. https://doi.org/10.1590/0104-530X4869-20. [4] Piñeros, J., Toscano, A., Ferreira, D., Morabito, R., Datasets for lot sizing and scheduling problems in the fruit-based beverage production process. Data in Brief (2021).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We hereby publish the dataset (with metadata) and the R script (R Core team, 2018) used for implementing the analysis presented in the paper "Food waste between environmental education, peers, and family influence. Insights from primary school students in Northern Italy", Journal of Cleaner Production (Piras et al., 2023). The dataset is provided in csv format with semicolons as separators and "NA" for missing data. The dataset includes all the variables used in at least one of the models presented in the paper, either in the main text or in the Supplementary Material. Other variables gathered by means of the questionnaires included as Supplementary Material of the paper have been removed. The dataset includes inputted values for missing data on independent variables. These were inputted using two approaches: last observation carried forward (LOCF) - preferred when possible - and last observation carried backward (LOCB). The metadata are presented as a PDF file.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A contract was notified by the city on 9 November 2021 under an open call for tenders for the cleaning of the premises and windows of schools and municipal buildings in two lots over four years. The House of Public Services (M.S.P.) was not one of the buildings included in this contract because the cleaning was carried out by the municipal staff. M.S.P. à un partenaire privé. This is a single framework agreement awarded in the form of a public contract for the provision of services with purchase orders, with no minimum amount and with a maximum annual amount of expenditure, without reopening competition when awarding purchase orders and awarded under the provisions of Articles R.2162-1 to R.2162-6 of the Public Order Code. This contract will start on the date of its notification It is concluded for an initial period of 12 months. It may then be renewed by express means until 9 November 2025. Montant maximum de dépenses sur 1 an : 50 000 € excl. tax Montant maximum de dépenses sur la durée totale du marché : €100 000 excl. tax
Use the project file first, then open the cleaning R file to clean the raw data. Then use the R file called OLS analysis to analyze the cleaned data, which was outputted as a .rds file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The layer refers to a symbol indicating the location of Cleaning Sinks (for cleaning fish) on public land in the Gold Coast area.\r \r Please note that this data will have a different structure and labelling to the 2013 version by virtue of changes required through City's introduction of a new asset management system.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Japan Outbound Tourism Consumption: C&R: Cleaning data was reported at 0.000 JPY bn in 2016. This stayed constant from the previous number of 0.000 JPY bn for 2015. Japan Outbound Tourism Consumption: C&R: Cleaning data is updated yearly, averaging 0.000 JPY bn from Dec 2005 (Median) to 2016, with 12 observations. Japan Outbound Tourism Consumption: C&R: Cleaning data remains active status in CEIC and is reported by Ministry of Land, Infrastructure, Transport and Tourism. The data is categorized under Global Database’s Japan – Table JP.Q014: Outbound Tourism Consumption.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Get the sample copy of Commercial Ultrasonic Cleaning Market Report 2025 (Global Edition) which includes data such as Market Size, Share, Growth, CAGR, Forecast, Revenue, list of Commercial Ultrasonic Cleaning Companies (Branson Ultrasonics Corporation, Blue Wave Ultrasonics, Caresonic, Cleaning Technologies Group, L&R Manufacturing, SharperTek, Kitamoto, Crest Ultrasonics, Morantz Ultrasonics), Market Segmented by Type (General, Professional), by Application (Metal, Chemical, Consummer Goods, Others)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The LSC (Leicester Scientific Corpus)
April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online
The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R
The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:
Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.
Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identified by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set contains the scripts used for importing, trimming, cleaning, analysing, and plotting a large dataset of inclination experiments with an SOFC module. The measurement data is confidential, so it could not be published alongside the scripts. One row of dummy input data is published to illustrate the structure of the analysed data. The analysis is used for the journal paper "Experimental Evaluation of a Solid Oxide Fuel Cell System Exposed to Inclinations and Accelerations by Ship Motions".
The scripts contain:
- A script that reads the data, removes unusable data and transforms into analysable dataframes (Clean and trim.R)
- Two files to make a wide variety of plots (Plotting.R and Specificplots.R)
- A file data does a Gaussian Progress regression to estimate the degradation rate (Degradation estimation.R)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data set contains all the data which appear as plots in the paper. This includes wall shear stress distributions, deposit thickness profiles, cleaning rate kinetic constants and correlations between these parameters. Spreadsheets contain futher details.
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 4 release notes:Adds data for 2018Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.
This dataset was created by Hussein Al Chami