14 datasets found
  1. d

    Data design thinking: data cleaning improvements using tableau prep

    • datadryad.org
    zip
    Updated Apr 13, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Felker (2018). Data design thinking: data cleaning improvements using tableau prep [Dataset]. http://doi.org/10.15146/R3R68G
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 13, 2018
    Dataset provided by
    Dryad
    Authors
    Christopher Felker
    Time period covered
    Apr 13, 2018
    Area covered
    Description

    dsd/043 dimension sdmx data structure definition exposure type

    dsd/045 dimension sdmx data structure definition valuation method

    universal resource locator url http://bit.ly/2wFtGw8

    dataset

    data structure definition

    ECB_CBD2 agency

    download SDMX 2.1 schema of the ECB_CBD2 DSD http://bit.ly/2ImA7p3

    uc health / ucsd health dataset

    data structure definition(s)

    UCH_CCD1 agency <0000 0001 2107 4242 ucsd health>

    access to CCD1 is through the ucsd tableau server

    Metrics based on this standard are developed by persons listed in this resource

    d/416 2018 19 131 master organisation chart ucsd health patient financial services 0000 0001 2107 4242 ucsd health

    Discovery metrics

    Beta metrics

    CCD Bm 0.0

    Alpha metrics

    CCD Am 0.0

    P...

  2. Additional file 2 of Combining location-and-scale batch effect adjustment...

    • springernature.figshare.com
    • datasetcatalog.nlm.nih.gov
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Hornung; Anne-Laure Boulesteix; David Causeur (2023). Additional file 2 of Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment [Dataset]. http://doi.org/10.6084/m9.figshare.c.3606539_D2.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Roman Hornung; Anne-Laure Boulesteix; David Causeur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains all necessary R-Code to reproduce and evaluate the real-data analyses and simulations, as well as Rda-files enabling fast evaluation of the corresponding results. (ZIP 2406 kb)

  3. w

    Car fuel consumptions and emissions 2000-2013

    • data.wu.ac.at
    csv, json
    Updated Mar 10, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carbon Emissions (2014). Car fuel consumptions and emissions 2000-2013 [Dataset]. https://data.wu.ac.at/odso/datahub_io/NjhlMGI0NTUtMmYzOS00NTBmLWJhYmItM2VlYTQwZjUyZGU2
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Mar 10, 2014
    Dataset provided by
    Carbon Emissions
    License

    http://reference.data.gov.uk/id/open-government-licencehttp://reference.data.gov.uk/id/open-government-licence

    Description

    Cleaned-up and consolidated car fuel consumption and emissions data for years 2000 to 2013. Data is published by the Vehicle Certification Agency (VCA), an Executive Agency of the United Kingdom Department for Transport.

    Data is available to download at http://carfueldata.direct.gov.uk/downloads/default.aspx.

    It is assumed that the data is released under the UK Open Government License.

    For more details about the data, please check the information booklet http://carfueldata.direct.gov.uk/additional/aug2013/VCA-Booklet-text-Aug-2013.pdf.

    Data Cleaning

    The original data is published in separate CSV file starting from 2000, but the format is not consistent across years. Data has been consolidated for machine using OpenRefine. The script with the tasks performed on the 2013 CSV files is included in the scripts folder. Some example operations performed include:

    • Consolidate different field names across different years
    • Consolidate measure units for emissions data across different years
    • Set proper field types to allow indexing and analysis (eg numeric fields)
    • Normalize manufacturer and model descriptions
    • Trim excess whitespace
    • Fix encoding for special characters
    • etc

    Note that the resulting dataset does not include all fields in the original data, only those deemed more relevant.

  4. Consolidated (CNSL) Catching Up? (Forecast)

    • kappasignal.com
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2024). Consolidated (CNSL) Catching Up? (Forecast) [Dataset]. https://www.kappasignal.com/2024/03/consolidated-cnsl-catching-up.html
    Explore at:
    Dataset updated
    Mar 30, 2024
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    Consolidated (CNSL) Catching Up?

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  5. f

    Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  6. Electronic Sales

    • kaggle.com
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anshul Pachauri (2023). Electronic Sales [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/electronic-sales
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 19, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Anshul Pachauri
    Description

    The provided Python code is a comprehensive analysis of sales data for a business that involves the merging of monthly sales data, cleaning and augmenting the dataset, and performing various analytical tasks. Here's a breakdown of the code:

    Data Preparation and Merging:

    The code begins by importing necessary libraries and filtering out warnings. It merges sales data from 12 months into a single file named "all_data.csv." Data Cleaning:

    Rows with NaN values are dropped, and any entries starting with 'Or' in the 'Order Date' column are removed. Columns like 'Quantity Ordered' and 'Price Each' are converted to numeric types for further analysis. Data Augmentation:

    Additional columns such as 'Month,' 'Sales,' and 'City' are added to the dataset. The 'City' column is derived from the 'Purchase Address' column. Analysis:

    Several analyses are conducted, answering questions such as: The best month for sales and total earnings. The city with the highest number of sales. The ideal time for advertisements based on the number of orders per hour. Products that are often sold together. The best-selling products and their correlation with price. Visualization:

    Bar charts and line plots are used for visualizing the analysis results, making it easier to interpret trends and patterns. Matplotlib is employed for creating visualizations. Summary:

    The code concludes with a comprehensive visualization that combines the quantity ordered and average price for each product, shedding light on product performance. This code is structured to offer insights into sales patterns, customer behavior, and product performance, providing valuable information for strategic decision-making in the business.

  7. v

    Global Master Data Management (MDM) BPO Market Size By Type of Service, By...

    • verifiedmarketresearch.com
    Updated Feb 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Master Data Management (MDM) BPO Market Size By Type of Service, By Vertical Industry, By Size of Organization, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/master-data-management-mdm-bpo-market/
    Explore at:
    Dataset updated
    Feb 27, 2024
    Dataset authored and provided by
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Master Data Management (MDM) BPO Market size was valued at USD 2.38 Billion in 2023 and is projected to reach USD 6.42 Billion by 2030, growing at a CAGR of 14.3% during the forecasted period 2024 to 2030.

    Global Master Data Management (MDM) BPO Market Drivers

    The market drivers for the Master Data Management (MDM) BPO Market can be influenced by various factors. These may include:

    A Growing Emphasis on Data Quality and Governance: As data spreads throughout enterprises, it is critical to maintain accurate, consistent, and trustworthy master data. MDM BPO services assist businesses enhance data integrity and compliance with laws like the California Consumer Privacy Act (CCPA) and the General Data Protection Regulation (GDPR) by providing expertise in data quality management, governance, and stewardship.

    Rapidly Increasing Data Volumes and Complexity: Managing and consolidating master data is made more difficult by the exponential growth of data coming from a variety of sources, such as supplier records, product data, and customer information. In order to handle massive data volumes and tackle the challenge of managing master data across several systems, applications, and business units, MDM BPO providers provide scalable solutions.

    Concentrate on Core Competencies and Cost Optimization: By outsourcing MDM tasks, businesses may take advantage of BPO providers' data management skills while concentrating on their core business operations. Outsourcing MDM tasks like data cleaning, deduplication, and standardization helps businesses save money, run more efficiently, and launch new goods and services more quickly.

    Globalization & Expansion Initiatives: Companies have difficulties with data harmonization, localization, and regulatory compliance as they enter new markets and geographical areas. MDM BPO services provide data consistency, master data standardization across geographies, and industry and local data privacy law compliance.

    Adoption of Cloud-based MDM Solutions: With the move to cloud-based MDM solutions, businesses can now get MDM features as a service without having to hire specialists or make large infrastructure investments. Cloud-based MDM platforms and services with flexibility, scalability, and quick implementation are provided by MDM BPO providers to satisfy changing corporate needs.

  8. Data from: CZL CONSOLIDATED ZINC LIMITED (Forecast)

    • kappasignal.com
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KappaSignal (2023). CZL CONSOLIDATED ZINC LIMITED (Forecast) [Dataset]. https://www.kappasignal.com/2023/06/czl-consolidated-zinc-limited.html
    Explore at:
    Dataset updated
    Jun 2, 2023
    Dataset authored and provided by
    KappaSignal
    License

    https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html

    Description

    This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

    CZL CONSOLIDATED ZINC LIMITED

    Financial data:

    • Historical daily stock prices (open, high, low, close, volume)

    • Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

    • Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

    Machine learning features:

    • Feature engineering based on financial data and technical indicators

    • Sentiment analysis data from social media and news articles

    • Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

    Potential Applications:

    • Stock price prediction

    • Portfolio optimization

    • Algorithmic trading

    • Market sentiment analysis

    • Risk management

    Use Cases:

    • Researchers investigating the effectiveness of machine learning in stock market prediction

    • Analysts developing quantitative trading Buy/Sell strategies

    • Individuals interested in building their own stock market prediction models

    • Students learning about machine learning and financial applications

    Additional Notes:

    • The dataset may include different levels of granularity (e.g., daily, hourly)

    • Data cleaning and preprocessing are essential before model training

    • Regular updates are recommended to maintain the accuracy and relevance of the data

  9. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  10. Brain Tumor MRI Multi-Class Dataset

    • kaggle.com
    Updated May 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maxwell Bernard (2025). Brain Tumor MRI Multi-Class Dataset [Dataset]. https://www.kaggle.com/datasets/maxwellbernard/brain-tumor-mri-multi-class-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 11, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Maxwell Bernard
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset consolidates brain tumor MRI images from multiple Kaggle data sources to create a larger, centralised dataset for research and model development purposes.

    The dataset comprises of 16,269 images containing four main classes : - Glioma (3,325 Images) - Meningioma (3,266 Images) - Pituitary (2,974 Images) - Healthy (6,704 Images)

    Key Notes:

    Duplicate images are likely due to dataset overlaps when sourcing. We strongly recommend users perform deduplication before training.

    The dataset does not apply any cleaning, resizing, or augmentation — it's intended to be raw and inclusive for flexibility.

    Recommendation:

    This dataset is ideal for users who want to experiment with preprocessing, augmentation, and custom cleaning pipelines on a real-world, mixed-quality dataset. Please consult medical professionals if using this data for clinical or diagnostic applications.

    File Structure

    The dataset is organised as follows: - Each folder represents the 4 classes - The filenames of each image contain the original dataset source (Name based on user who published the dataset to Kaggle)

    Data Sources:

    This dataset combines the following five Kaggle datasets:

    1. Brain Tumors Dataset (Excluded their augmented images) by Seyed Mohammad Hossein Hashemi
    2. PMRAM Bangladeshi Brain Cancer MRI Dataset by Orville
    3. Brain Tumor MRI Images (17 Classes) by Fernando Feltrin (Only T1 glioma/meningioma/healthy images used).
    4. SIAR Dataset by Masoumeh Siar (Only healthy scans used as this was a binary dataset, and did not differentiate the tumor types).
    5. Brain Tumor MRI Scans by Rajarshi Mandal

    These datasets were selected for their popularity, quality, and complementary class coverage. We recommend checking the original sources for more information about data collection methods and original licensing.

    License

    This combined dataset is released under CC BY-SA 4.0 to comply with ShareAlike requirements of source datasets:

    Source DatasetOriginal License
    Brain Tumors DatasetCC0
    Brain Tumor MRI ScansCC0
    SIAR DatasetUnkown. Requires citation in publications.
    PMRAM Bangladeshi Brain Cancer MRI DatasetCC BY-SA 4.0
    Brain Tumor MRI Images (17 Classes)ODbL 1.0
  11. A

    Automated Tank Cleaning Service Report

    • marketreportanalytics.com
    doc, pdf, ppt
    Updated Apr 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Report Analytics (2025). Automated Tank Cleaning Service Report [Dataset]. https://www.marketreportanalytics.com/reports/automated-tank-cleaning-service-52386
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Apr 2, 2025
    Dataset authored and provided by
    Market Report Analytics
    License

    https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The automated tank cleaning service market, valued at $409 million in 2025, is projected to experience steady growth, driven by increasing demand for efficient and safe cleaning solutions across various industries. The rising adoption of automation in the oil and gas, chemical, and food processing sectors is a key driver, as it minimizes risks associated with manual cleaning, improves operational efficiency, and reduces labor costs. Stringent environmental regulations concerning hazardous waste disposal are further propelling market growth, emphasizing the need for automated systems that ensure compliance and minimize environmental impact. The market is segmented by application (crude oil tanks, refinery tanks, commercial tanks, and others) and type (semi-automatic and fully automatic systems). Fully automatic systems are expected to witness significant growth due to their enhanced safety features and higher cleaning efficiency. Geographic expansion, particularly in emerging economies with growing industrialization, presents lucrative opportunities for market players. However, the high initial investment cost of automated systems and the need for skilled personnel for operation and maintenance could pose challenges to market growth. Competition among established players and emerging technological advancements will further shape the market landscape. The forecast period (2025-2033) anticipates a sustained expansion, fueled by technological innovations and increasing regulatory pressures. The competitive landscape is characterized by a mix of large multinational corporations and specialized regional service providers. Key players such as Dulsco, National Tank Services, Clean Harbors, and others are actively investing in research and development to enhance their offerings and expand their market share. Strategic partnerships, mergers, and acquisitions are also prevalent, driving market consolidation and innovation. The market is witnessing a shift towards integrated solutions, combining automated cleaning with related services like waste management and tank inspection. This trend is expected to further drive market growth and consolidate the service offerings of market participants. The adoption of advanced technologies such as robotics, AI, and data analytics is enhancing cleaning efficiency, optimizing resource utilization, and reducing operational costs. This technological advancement presents immense opportunities for players to enhance their offerings and gain a competitive edge.

  12. W

    Wind Power Equipment Cleaning Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Apr 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Wind Power Equipment Cleaning Report [Dataset]. https://www.datainsightsmarket.com/reports/wind-power-equipment-cleaning-115775
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Apr 3, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global wind power equipment cleaning market is experiencing robust growth, driven by the increasing demand for renewable energy and the expanding wind power capacity worldwide. The market's expansion is fueled by several key factors, including the rising awareness of the importance of regular cleaning for optimal turbine performance and extended lifespan. Increased operational efficiency, improved energy yield, and reduced maintenance costs are significant incentives for wind power operators to prioritize cleaning services. Technological advancements in cleaning techniques, such as drone-based inspections and automated cleaning systems, are further boosting market growth. While initial investment costs for some advanced cleaning technologies might represent a restraint, the long-term return on investment through enhanced energy production and reduced downtime often outweighs these considerations. The market is segmented by application (onshore and offshore wind farms) and types of cleaning services (blade cleaning, nacelle cleaning, tower cleaning). Major players are actively consolidating their market share through strategic acquisitions and technological innovations. The market is geographically diverse, with North America and Europe currently leading in adoption, but significant growth potential exists in rapidly developing Asian economies such as China and India, as their wind power installations expand. The forecast period (2025-2033) projects consistent growth, reflecting continued investment in renewable energy and a rising focus on optimizing the performance of existing wind farms. Considering a hypothetical CAGR of 8% and a 2025 market size of $2 billion (a reasonable estimate based on the scale of the wind energy industry), the market is poised for substantial expansion. The competitive landscape is characterized by a mix of specialized cleaning service providers and larger companies offering integrated maintenance solutions. Key factors influencing future market trends include regulatory changes promoting renewable energy, advancements in artificial intelligence (AI) for predictive maintenance, and the increasing adoption of sustainable cleaning practices. The offshore wind power segment presents a significant growth opportunity, although it also presents unique challenges related to accessibility and environmental considerations. The market will likely witness further consolidation among players, as companies seek to expand their service offerings and geographical reach. The demand for skilled technicians and specialized equipment will also continue to grow, creating new employment opportunities.

  13. H

    Hospital Cleaning Services Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Hospital Cleaning Services Report [Dataset]. https://www.datainsightsmarket.com/reports/hospital-cleaning-services-1462432
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The hospital cleaning services market is experiencing robust growth, driven by increasing healthcare-associated infections (HAIs) and stringent hygiene regulations. The market's value is estimated at $15 billion in 2025, projected to grow at a Compound Annual Growth Rate (CAGR) of 6% from 2025 to 2033. This growth is fueled by several factors, including the rising number of hospital beds globally, an aging population requiring more healthcare services, and increased awareness of infection control best practices. Technological advancements, such as the adoption of automated cleaning systems and the use of environmentally friendly disinfectants, are further contributing to market expansion. However, challenges remain, including the high cost of specialized cleaning equipment and trained personnel, and the need for continuous training to keep pace with evolving infection control protocols. The market is segmented by service type (disinfection, sterilization, waste management), cleaning technology (manual, automated), and hospital type (general, specialized). Leading players such as ServiceMaster Clean, Jani-King, and Clean Team are consolidating their market share through acquisitions and expansion into new geographical regions. This competitive landscape is driving innovation and improved service offerings. The forecast period of 2025-2033 anticipates continued growth, with a projected market value exceeding $25 billion by 2033. This expansion will be primarily driven by emerging economies where healthcare infrastructure is rapidly developing, and increasing demand for specialized cleaning services in critical care units and operating theaters. Key regional variations exist, with North America and Europe currently dominating the market, but significant growth potential is expected in Asia-Pacific and Latin America, fueled by rising healthcare spending and a focus on enhancing hygiene standards. To maintain a competitive edge, companies are investing in research and development to deliver advanced cleaning solutions and improve the efficiency and effectiveness of their services. A focus on sustainability and reducing environmental impact is also becoming increasingly important, influencing the adoption of eco-friendly cleaning products and practices.

  14. i

    Annual Survey of Industries 2000-2001 - India

    • dev.ihsn.org
    • catalog.ihsn.org
    • +1more
    Updated Apr 25, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Statistics Office (Industrial Statistics Wing) (2019). Annual Survey of Industries 2000-2001 - India [Dataset]. https://dev.ihsn.org/nada/catalog/72969
    Explore at:
    Dataset updated
    Apr 25, 2019
    Dataset authored and provided by
    Central Statistics Office (Industrial Statistics Wing)
    Time period covered
    2001 - 2002
    Area covered
    India
    Description

    Abstract

    Introduction

    The Annual Survey of Industries (ASI) is one of the large-scale sample survey conducted by Field Operation Division of National Sample Survey Office for more than three decades with the objective of collecting comprehensive information related to registered factories on annual basis. ASI is the primary source of data for facilitating systematic study of the structure of industries, analysis of various factors influencing industries in the country and creating a database for formulation of industrial policy.

    The main objectives of the Annual Survey of Industries are briefly as follows:

    (a) Estimation of the contribution of manufacturing industries as a whole and of each unit to national income.

    (b) Systematic study of the structure of industry as a whole and of each type of industry and each unit.

    (c) Casual analysis of the various factors influencing industry in the country: and

    (d) Provision of comprehensive, factual and systematic basis for the formulation of policy.

    The Annual Survey of Industries (ASI) is the principal source of industrial statistics in India. It provides statistical information to assess changes in the growth, composition and structure of organised manufacturing sector comprising activities related to manufacturing processes, repair services, gas and water supply and cold storage. The Survey is conducted annually under the statutory provisions of the Collection of Statistics Act 1953, and the Rules framed there-under in 1959, except in the State of Jammu & Kashmir where it is conducted under the State Collection of Statistics Act, 1961 and the rules framed there-under in 1964.

    Geographic coverage

    The ASI is the principal source of industrial statistics in India and extends to the entire country except Arunachal Pradesh, Mizoram & Sikkim and the Union Territory of Lakshadweep. It covers all factories registered under Sections 2m(i) and 2m(ii) of the Factories Act, 1948.

    Analysis unit

    The primary unit of enumeration in the survey is a factory in the case of manufacturing industries, a workshop in the case of repair services, an undertaking or a licensee in the case of electricity, gas & water supply undertakings and an establishment in the case of bidi & cigar industries. The owner of two or more establishments located in the same State and pertaining to the same industry group and belonging to census scheme is, however, permitted to furnish a single consolidated return. Such consolidated returns are common feature in the case of bidi and cigar establishments, electricity and certain public sector undertakings.

    Universe

    The survey cover factories registered under the Factory Act 1948.

    Establishments under the control of the Defence Ministry,oil storage and distribution units, restaurants and cafes and technical training institutions not producing anything for sale or exchange were kept outside the coverage of the ASI.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Sampling Procedure

    The sampling design followed in ASI 2000-01 is a Circular Systematic one. All the factories in the updated frame (universe) are divided into two sectors, viz., Census and Sample.

    Census Sector: Census Sector is defined as follows:

    a) All the complete enumeration States namely, Manipur, Meghalaya, Nagaland, Tripura and Andaman & Nicobar Islands. b) For the rest of the States/ UT's., (i) units having 100 or more workers, and (ii) all factories covered under Joint Returns.

    Rest of the factories found in the frame constituted Sample sector on which sampling was done. Factories under Biri & Cigar sector were not considered uniformly under census sector. Factories under this sector were treated for inclusion in census sector as per definition above (i.e., more than 100 workers and/or joint returns). After identifying Census sector factories, rest of the factories were arranged in ascending order of States, NIC-98 (4 digit), number of workers and district and properly numbered. The Sampling fraction was taken as 12% within each stratum (State X Sector X 4-digit NIC) with a minimum of 8 samples except for the State of Gujarat where 9.5% sampling fraction was used. For the States of Jammu & Kashmir, Himachal Pradesh, Daman & Diu, Dadra & Nagar Haveli, Goa and Pondicherry, a minimum of 4 samples per stratum was selected. For the States of Bihar and Jharkhand, a minimum of 6 samples per stratum was selected. The entire sample was selected in the form of two independent sub-sample using Circular Systematic Sampling method.

    Sampling deviation

    There was no deviation from sample design in ASI 2000-01

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    Pre-data entry scrutiny was carried out on the schedules for inter and intra block consistency checks. Such editing was mostly manual, although some editing was automatic. But, for major inconsistencies, the schedules were referred back to NSSO (FOD) for clarifications/modifications.

    Validation checks are carried out on data files. Code list, State code list, Tabulation program and ASICC code are may be refered in the External Resources which are used for editing and data processing as well..

    B. Tabulation procedure

    The tabulation procedure by CSO(ISW) includes both the ASI 2000-01 data and the extracted data from ASI 99-00 for all tabulation purpose. For extracted returns, status of unit (Block A, Item 12) would be in the range 17 to 20. To make results comparable, users are requested to follow the same procedure. For calculation of various parameters, users are requested to refer instruction manual/report. Please note that a separate inflation factor (Multiplier) is available for each unit against records belonging to Block-A for ASI 2000-01 data. The multiplier is calculated for each stratum (i.e. State X NIC'98(4 Digit)) after adjusting for non-response cases.

    C. Merging of unit level data

    As per existing policy to merge unit level data at ultimate digit level of NIC'98 (i.e., 5 digit) for the purpose of dissemination, the data have been merged for industries having less than three units within State, District and NIC'98(5 Digit) with the adjoining industries within district and then to adjoining districts within a state. There may be some NIC'98(5 Digit) ending with '9' which do not figure in the book of NIC '98. These may be treated as 'Others' under the corresponding 4-digit group. To suppress the identity of factories data fields corresponding to PSL number, Industry code as per Frame (4-digit level of NIC-98) and RO/SRO code have been filled with '9' in each record.

    It may please be noted that, tables generated from the merged data may not tally with the published results for few industries, since the merging for published data has been done at aggregate-level to minimise loss of information.

    Sampling error estimates

    Relative Standard Error (RSE) is calculated in terms of worker, wages to worker and GVA using the formula (Pl ease refer to Estimation Procedure document in external resources). Programs developed in Visual Faxpro are used to compute the RSE of estimates.

    Data appraisal

    To check for consistency and reliability of data the same are compared with the NIC-2digit level growth rate at all India Index of Production (IIP) and the growth rates obtained from the National Accounts Statistics at current and constant prices for the registered manufacturing sector.

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christopher Felker (2018). Data design thinking: data cleaning improvements using tableau prep [Dataset]. http://doi.org/10.15146/R3R68G

Data design thinking: data cleaning improvements using tableau prep

Explore at:
zipAvailable download formats
Dataset updated
Apr 13, 2018
Dataset provided by
Dryad
Authors
Christopher Felker
Time period covered
Apr 13, 2018
Area covered
Description

dsd/043 dimension sdmx data structure definition exposure type

dsd/045 dimension sdmx data structure definition valuation method

universal resource locator url http://bit.ly/2wFtGw8

dataset

data structure definition

ECB_CBD2 agency

download SDMX 2.1 schema of the ECB_CBD2 DSD http://bit.ly/2ImA7p3

uc health / ucsd health dataset

data structure definition(s)

UCH_CCD1 agency <0000 0001 2107 4242 ucsd health>

access to CCD1 is through the ucsd tableau server

Metrics based on this standard are developed by persons listed in this resource

d/416 2018 19 131 master organisation chart ucsd health patient financial services 0000 0001 2107 4242 ucsd health

Discovery metrics

Beta metrics

CCD Bm 0.0

Alpha metrics

CCD Am 0.0

P...

Search
Clear search
Close search
Google apps
Main menu