47 datasets found

Project R- Data Cleaning- EDA- Visualization
kaggle.com
zip
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hussein Al Chami (2023). Project R- Data Cleaning- EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-r-data-cleaning-eda-visualization/code
Explore at:
zip(479277 bytes)Available download formats
Dataset updated
Dec 10, 2023
Authors
Hussein Al Chami
Description
Dataset

This dataset was created by Hussein Al Chami

Contents
f
Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s001
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
r
Street cleaning in City of Yarra
researchdata.edu.au
Updated Oct 2, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Yarra (2019). Street cleaning in City of Yarra [Dataset]. https://researchdata.edu.au/street-cleaning-city-yarra/2980765
Explore at:
Dataset updated
Oct 2, 2019
Dataset provided by
data.gov.au
Authors
City of Yarra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
Our aim is to make Yarra a clean and pleasant place for our residents to live. This data asset has information about sweeping and loose litter removal across residential roads, kerbs and public open spaces within the Yarra municipality. The street cleansing details include cleaning date and time, suburb where the cleaning was done, category of cleaning, volume of litter removed and cleaning duration.\r \r While all due care has been taken to ensure the data asset is accurate and current, Yarra City Council does not warrant that this data is definitive nor free of error and does not accept responsibility for any loss, damage, claim, expense, cost or liability whatsoever arising from reliance upon information provided herein.\r \r Feedback on the data asset - including compliments, complaints and requests for more detail - is welcome.
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
t
Johnstone, Heather J H, Lee, R W, Schulz, Michael (2016). Dataset: Calcite...
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Johnstone, Heather J H, Lee, R W, Schulz, Michael (2016). Dataset: Calcite saturation and yield and Mg/Ca, Mn/Ca, Al/Ca, Fe/Ca for four species of planktic foraminifera cleaned using Mg-cleaning and Cd-cleaning methods. https://doi.org/10.1594/PANGAEA.857974 [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-857974
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Four species of planktic foraminifera from core-tops spanning a depth transect on the Ontong Java Plateau were prepared for Mg/Ca analysis both with (Cd-cleaning) and without (Mg-cleaning) a reductive cleaning step. Reductive cleaning caused etching of foraminiferal calcite, focused on Mg-rich inner calcite, even on tests which had already been partially dissolved at the seafloor. Despite corrosion, there was no difference in Mg/Ca of Pulleniatina obliquiloculata between cleaning methods. Reductive cleaning decreased Mg/Ca by an average (all depths) of ~ 4% for Globigerinoides ruber white and ~ 10% for Neogloboquadrina dutertrei. Mg/Ca of Globigerinoides sacculifer (above the calcite saturation horizon only) was 5% lower after reductive cleaning. The decrease in Mg/Ca due to reductive cleaning appeared insensitive to preservation state for G. ruber, N. dutertrei and P. obliquiloculata. Mg/Ca of Cd-cleaned G. sacculifer appeared less sensitive to dissolution than that of Mg-cleaned. Mg-cleaning is adequate, but SEM and contaminants (Al/Ca, Fe/Ca and Mn/Ca) show that Cd-cleaning is more effective for porous species. A second aspect of the study addressed sample loss during cleaning. Lower yield after Cd-cleaning for G. ruber, G. sacculifer and N. dutertrei confirmed this to be the more aggressive method. Strongest correlations between yield and Delta[CO3^2-] in core-top samples were for Cd-cleaned G. ruber (r = 0.88, p = 0.020) and Cd-cleaned P. obliquiloculata (r = 0.68, p = 0.030). In a down-core record (WIND28K) correlation, r, between yield values > 30% and dissolution index, XDX, was -0.61 (p = 0.002). Where cleaning yield < 30% most Mg-cleaned Mg/Ca values were biased by dissolution.
v
Ultrasonic Cleaning Equipment Market Share | Global Report, 2033
valuemarketresearch.com
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Value Market Research (2024). Ultrasonic Cleaning Equipment Market Share | Global Report, 2033 [Dataset]. https://www.valuemarketresearch.com/report/ultrasonic-cleaning-equipment-market
Explore at:
electronic (pdf), ms excelAvailable download formats
Dataset updated
Jan 24, 2024
Dataset authored and provided by
Value Market Research
License
https://www.valuemarketresearch.com/privacy-policyhttps://www.valuemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Description
The global demand for Ultrasonic Cleaning Equipment Market is presumed to reach the market size of nearly USD 970.83 Million by 2032 from USD 680.91 Million in 2023 with a CAGR of 4.02% under the study period 2024-2035.

Ultrasonic cleaning is the process of removing contamination from surfaces, nooks and crannies by passing sound waves through the water to create microscopic implosions. The implo
S
Stainless Steel Tank Foamers Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Stainless Steel Tank Foamers Report [Dataset]. https://www.datainsightsmarket.com/reports/stainless-steel-tank-foamers-27381
Explore at:
doc, pdf, pptAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global stainless steel tank foamers market is experiencing robust growth, driven by increasing demand across diverse sectors. The market's expansion is fueled by several key factors. Firstly, the rising adoption of foam cleaning technologies in industrial settings, particularly factory and kitchen cleaning, offers significant opportunities. Stainless steel's inherent properties – durability, corrosion resistance, and ease of sanitation – make it the preferred material for these applications. The market is segmented by capacity (24L, 50L, 100L, and others), reflecting the varied needs of different users. Furthermore, growing environmental concerns are prompting businesses to adopt eco-friendly cleaning solutions compatible with stainless steel tank foamers, contributing to market expansion. Finally, technological advancements leading to more efficient and user-friendly foamer designs are also driving growth. We estimate the market size to be approximately $500 million in 2025, based on industry analyses of related equipment markets and considering the factors mentioned above. A conservative CAGR of 7% is projected for the forecast period (2025-2033), resulting in significant market expansion by 2033. However, factors such as high initial investment costs for the equipment and potential competition from alternative cleaning technologies could pose challenges to continued growth. Geographic distribution showcases a diverse landscape. North America and Europe currently dominate the market, driven by strong industrial sectors and early adoption of advanced cleaning technologies. However, the Asia-Pacific region is anticipated to experience rapid growth in the coming years due to increasing industrialization and rising disposable incomes. This region will likely witness significant market penetration due to the rising focus on hygiene and sanitation in various sectors. The competitive landscape includes both established manufacturers and new entrants, indicating a dynamic market with ongoing innovation and competition. Key players are focusing on product diversification, technological advancements, and strategic partnerships to maintain market share and capitalize on growth opportunities. This strategic approach is expected to solidify the position of these players in the coming years.
m
Data from: Datasets for lot sizing and scheduling problems in the...
data.mendeley.com
narcis.nl
Updated Jan 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan Piñeros (2021). Datasets for lot sizing and scheduling problems in the fruit-based beverage production process [Dataset]. http://doi.org/10.17632/j2x3gbskfw.1
Explore at:
Unique identifier
https://doi.org/10.17632/j2x3gbskfw.1
Dataset updated
Jan 19, 2021
Authors
Juan Piñeros
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets presented here were partially used in “Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings” (Toscano, A., Ferreira, D. , Morabito, R. , Computers & Chemical Engineering) [1], in “A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning” (Toscano, A., Ferreira, D. , Morabito, R. , Flexible Services and Manufacturing Journal) [2], and in “A heuristic approach to optimize the production scheduling of fruit-based beverages” (Toscano et al., Gestão & Produção, 2020) [3]. In fruit-based production processes, there are two production stages: preparation tanks and production lines. This production process has some process-specific characteristics, such as temporal cleanings and synchrony between the two production stages, which make optimized production planning and scheduling even more difficult. In this sense, some papers in the literature have proposed different methods to solve this problem. To the best of our knowledge, there are no standard datasets used by researchers in the literature in order to verify the accuracy and performance of proposed methods or to be a benchmark for other researchers considering this problem. The authors have been using small data sets that do not satisfactorily represent different scenarios of production. Since the demand in the beverage sector is seasonal, a wide range of scenarios enables us to evaluate the effectiveness of the proposed methods in the scientific literature in solving real scenarios of the problem. The datasets presented here include data based on real data collected from five beverage companies. We presented four datasets that are specifically constructed assuming a scenario of restricted capacity and balanced costs. These dataset is supplementary data for the submitted paper to Data in Brief [4]. [1] Toscano, A., Ferreira, D., Morabito, R., Formulation and MIP-heuristics for the lot sizing and scheduling problem with temporal cleanings, Computers & Chemical Engineering. 142 (2020) 107038. Doi: 10.1016/j.compchemeng.2020.107038. [2] Toscano, A., Ferreira, D., Morabito, R., A decomposition heuristic to solve the two-stage lot sizing and scheduling problem with temporal cleaning, Flexible Services and Manufacturing Journal. 31 (2019) 142-173. Doi: 10.1007/s10696-017-9303-9. [3] Toscano, A., Ferreira, D., Morabito, R., Trassi, M. V. C., A heuristic approach to optimize the production scheduling of fruit-based beverages. Gestão & Produção, 27(4), e4869, 2020. https://doi.org/10.1590/0104-530X4869-20. [4] Piñeros, J., Toscano, A., Ferreira, D., Morabito, R., Datasets for lot sizing and scheduling problems in the fruit-based beverage production process. Data in Brief (2021).
Z
Dataset and R script for the analysis in the article "Food waste between...
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simone Righi (2024). Dataset and R script for the analysis in the article "Food waste between environmental education, peers, and family influence. Insights from primary school students in Northern Italy", Journal of Cleaner Production [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7101522
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Claudia Giordano
Simone Piras
Simone Righi
Federico Banchelli
Marco Setti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We hereby publish the dataset (with metadata) and the R script (R Core team, 2018) used for implementing the analysis presented in the paper "Food waste between environmental education, peers, and family influence. Insights from primary school students in Northern Italy", Journal of Cleaner Production (Piras et al., 2023). The dataset is provided in csv format with semicolons as separators and "NA" for missing data. The dataset includes all the variables used in at least one of the models presented in the paper, either in the main text or in the Supplementary Material. Other variables gathered by means of the questionnaires included as Supplementary Material of the paper have been removed. The dataset includes inputted values for missing data on independent variables. These were inputted using two approaches: last observation carried forward (LOCF) - preferred when possible - and last observation carried backward (LOCB). The metadata are presented as a PDF file.
g
Cleaning of the premises of the utility house over 2 years | gimi9.com
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cleaning of the premises of the utility house over 2 years | gimi9.com [Dataset]. https://www.gimi9.com/dataset/eu_6662bfb0d813cb4cf8637872/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A contract was notified by the city on 9 November 2021 under an open call for tenders for the cleaning of the premises and windows of schools and municipal buildings in two lots over four years. The House of Public Services (M.S.P.) was not one of the buildings included in this contract because the cleaning was carried out by the municipal staff. M.S.P. à un partenaire privé. This is a single framework agreement awarded in the form of a public contract for the provision of services with purchase orders, with no minimum amount and with a maximum annual amount of expenditure, without reopening competition when awarding purchase orders and awarded under the provisions of Articles R.2162-1 to R.2162-6 of the Public Order Code. This contract will start on the date of its notification It is concluded for an initial period of 12 months. It may then be renewed by express means until 9 November 2025. Montant maximum de dépenses sur 1 an : 50 000 € excl. tax Montant maximum de dépenses sur la durée totale du marché : €100 000 excl. tax
d
Replication Data for: Race, gender, and the politics of incivility
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gubitz, Sam (2023). Replication Data for: Race, gender, and the politics of incivility [Dataset]. http://doi.org/10.7910/DVN/ODPNI8
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ODPNI8
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Gubitz, Sam
Description
Use the project file first, then open the cleaning R file to clean the raw data. Then use the R file called OLS analysis to analyze the cleaned data, which was outputted as a .rds file.
f
Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene...
figshare.com
docx
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s002
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Cleaning Sinks
researchdata.edu.au
Updated Aug 25, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Gold Coast (2013). Cleaning Sinks [Dataset]. https://researchdata.edu.au/cleaning-sinks/2996062
Explore at:
Dataset updated
Aug 25, 2013
Dataset provided by
Data.govhttps://data.gov/
Authors
City of Gold Coast
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Description
The layer refers to a symbol indicating the location of Cleaning Sinks (for cleaning fish) on public land in the Gold Coast area.\r \r Please note that this data will have a different structure and labelling to the 2013 version by virtue of changes required through City's introduction of a new asset management system.
Oil Cleaner Import Data of Triple R Co Limited Exporter to USA
seair.co.in
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2024). Oil Cleaner Import Data of Triple R Co Limited Exporter to USA [Dataset]. https://www.seair.co.in
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 22, 2024
Dataset provided by
Seair Exim Solutions
Authors
Seair Exim
Area covered
United States
Description
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
J
Japan Outbound Tourism Consumption: C&R: Cleaning
ceicdata.com
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2025). Japan Outbound Tourism Consumption: C&R: Cleaning [Dataset]. https://www.ceicdata.com/en/japan/outbound-tourism-consumption/outbound-tourism-consumption-cr-cleaning
Explore at:
Dataset updated
Jan 15, 2025
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 1, 2005 - Dec 1, 2016
Area covered
Japan
Description
Japan Outbound Tourism Consumption: C&R: Cleaning data was reported at 0.000 JPY bn in 2016. This stayed constant from the previous number of 0.000 JPY bn for 2015. Japan Outbound Tourism Consumption: C&R: Cleaning data is updated yearly, averaging 0.000 JPY bn from Dec 2005 (Median) to 2016, with 12 observations. Japan Outbound Tourism Consumption: C&R: Cleaning data remains active status in CEIC and is reported by Ministry of Land, Infrastructure, Transport and Tourism. The data is categorized under Global Database’s Japan – Table JP.Q014: Outbound Tourism Consumption.
c
Global Commercial Ultrasonic Cleaning Market Report 2025 Edition, Market...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2024). Global Commercial Ultrasonic Cleaning Market Report 2025 Edition, Market Size, Share, CAGR, Forecast, Revenue [Dataset]. https://www.cognitivemarketresearch.com/commercial-ultrasonic-cleaning-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Dec 15, 2024
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
Get the sample copy of Commercial Ultrasonic Cleaning Market Report 2025 (Global Edition) which includes data such as Market Size, Share, Growth, CAGR, Forecast, Revenue, list of Commercial Ultrasonic Cleaning Companies (Branson Ultrasonics Corporation, Blue Wave Ultrasonics, Caresonic, Cleaning Technologies Group, L&R Manufacturing, SharperTek, Kitamoto, Crest Ultrasonics, Morantz Ultrasonics), Market Segmented by Type (General, Professional), by Application (Metal, Chemical, Consummer Goods, Others)
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LSC (Leicester Scientific Corpus) [Dataset]. https://figshare.le.ac.uk/articles/dataset/LSC_Leicester_Scientific_Corpus_/9449639
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
n
Scripts for cleaning and analysis of data from SOFC experiment on...
4tu.edu.hpc.n-helix.com
zip
Updated Aug 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berend van Veldhuizen (2024). Scripts for cleaning and analysis of data from SOFC experiment on inclination test-bench. [Dataset]. http://doi.org/10.4121/ed0a0cff-7af9-4d3a-baf7-aab5efe39bd1.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/ed0a0cff-7af9-4d3a-baf7-aab5efe39bd1.v1
Dataset updated
Aug 27, 2024
Dataset provided by
4TU.ResearchData
Authors
Berend van Veldhuizen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2023
Dataset funded by
European Commission
Description
This data set contains the scripts used for importing, trimming, cleaning, analysing, and plotting a large dataset of inclination experiments with an SOFC module. The measurement data is confidential, so it could not be published alongside the scripts. One row of dummy input data is published to illustrate the structure of the analysed data. The analysis is used for the journal paper "Experimental Evaluation of a Solid Oxide Fuel Cell System Exposed to Inclinations and Accelerations by Ship Motions".
The scripts contain:
- A script that reads the data, removes unusable data and transforms into analysable dataframes (Clean and trim.R)
- Two files to make a wide variety of plots (Plotting.R and Specificplots.R)
- A file data does a Gaussian Progress regression to estimate the degradation rate (Degradation estimation.R)
c
Research data supporting "Cleaning of simple cohesive soil layers in a...
repository.cam.ac.uk
bin
Updated Sep 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deshmukh, Kartik; Arlov, Dragana; Cant, R; Goransson, Anders; Innings, Fredrik; Wilson, David (2022). Research data supporting "Cleaning of simple cohesive soil layers in a radial flow cell" [Dataset]. http://doi.org/10.17863/CAM.88960
Explore at:
bin(446875 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.88960
Dataset updated
Sep 30, 2022
Dataset provided by
Apollo
University of Cambridge
Authors
Deshmukh, Kartik; Arlov, Dragana; Cant, R; Goransson, Anders; Innings, Fredrik; Wilson, David
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set contains all the data which appear as plots in the paper. This includes wall shear stress distributions, deposit thickness profiles, cleaning rate kinetic constants and correlations between these parameters. Spreadsheets contain futher details.
g
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
datasearch.gesis.org
openicpsr.org
Updated Feb 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaplan, Jacob (2020). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Property Stolen and Recovered (Supplement to Return A) 1960-2018 [Dataset]. http://doi.org/10.3886/E105403
Explore at:
Unique identifier
https://doi.org/10.3886/E105403
Dataset updated
Feb 19, 2020
Dataset provided by
da|ra (Registration agency for social science and economic data)
Authors
Kaplan, Jacob
Description
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 4 release notes:Adds data for 2018Version 3 release notes:Adds data in the following formats: Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Version 2 release notes:Adds data for 2017.Adds a "number_of_months_reported" variable which says how many months of the year the agency reported data.Property Stolen and Recovered is a Uniform Crime Reporting (UCR) Program data set with information on the number of offenses (crimes included are murder, rape, robbery, burglary, theft/larceny, and motor vehicle theft), the value of the offense, and subcategories of the offense (e.g. for robbery it is broken down into subcategories including highway robbery, bank robbery, gas station robbery). The majority of the data relates to theft. Theft is divided into subcategories of theft such as shoplifting, theft of bicycle, theft from building, and purse snatching. For a number of items stolen (e.g. money, jewelry and previous metals, guns), the value of property stolen and and the value for property recovered is provided. This data set is also referred to as the Supplement to Return A (Offenses Known and Reported). All the data was received directly from the FBI as text or .DTA files. I created a setup file based on the documentation provided by the FBI and read the data into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here: https://github.com/jacobkap/crime_data. The Word document file available for download is the guidebook the FBI provided with the raw data which I used to create the setup file to read in data.There may be inaccuracies in the data, particularly in the group of columns starting with "auto." To reduce (but certainly not eliminate) data errors, I replaced the following values with NA for the group of columns beginning with "offenses" or "auto" as they are common data entry error values (e.g. are larger than the agency's population, are much larger than other crimes or months in same agency): 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99942. This cleaning was NOT done on the columns starting with "value."For every numeric column I replaced negative indicator values (e.g. "j" for -1) with the negative number they are supposed to be. These negative number indicators are not included in the FBI's codebook for this data but are present in the data. I used the values in the FBI's codebook for the Offenses Known and Clearances by Arrest data.To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. If an agency has used a different FIPS code in the past, check to make sure the FIPS code is the same as in this data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Hussein Al Chami (2023). Project R- Data Cleaning- EDA- Visualization [Dataset]. https://www.kaggle.com/datasets/husseinalchami/project-r-data-cleaning-eda-visualization/code

Project R- Data Cleaning- EDA- Visualization

Explore at:

zip(479277 bytes)Available download formats

Dataset updated

Dec 10, 2023

Authors

Hussein Al Chami

Description

Dataset

This dataset was created by Hussein Al Chami

Clear search

Close search

Google apps

Main menu

Project R- Data Cleaning- EDA- Visualization

Dataset

Contents

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

Street cleaning in City of Yarra

Data Cleaning Sample

Johnstone, Heather J H, Lee, R W, Schulz, Michael (2016). Dataset: Calcite...

Ultrasonic Cleaning Equipment Market Share | Global Report, 2033

Stainless Steel Tank Foamers Report

Data from: Datasets for lot sizing and scheduling problems in the...

Dataset and R script for the analysis in the article "Food waste between...

Cleaning of the premises of the utility house over 2 years | gimi9.com

Replication Data for: Race, gender, and the politics of incivility

Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene...

Cleaning Sinks

Oil Cleaner Import Data of Triple R Co Limited Exporter to USA

Japan Outbound Tourism Consumption: C&R: Cleaning

Global Commercial Ultrasonic Cleaning Market Report 2025 Edition, Market...

LSC (Leicester Scientific Corpus)

Scripts for cleaning and analysis of data from SOFC experiment on...

Research data supporting "Cleaning of simple cohesive soil layers in a...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Project R- Data Cleaning- EDA- Visualization

Dataset

Contents