26 datasets found
  1. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  2. o

    Data Bytes - Data Cleaning with OpenRefine

    • osf.io
    Updated May 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kay Kuhlemeier Bjornen; Clarke Iakovakis (2022). Data Bytes - Data Cleaning with OpenRefine [Dataset]. https://osf.io/krytd
    Explore at:
    Dataset updated
    May 4, 2022
    Dataset provided by
    Center For Open Science
    Authors
    Kay Kuhlemeier Bjornen; Clarke Iakovakis
    Description

    If you work with large spreadsheets and have been known to spend hours going cell by cell to find errors, we can show you how to save time and aggravation. OpenRefine is a powerful free tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data. OpenRefine always keeps your data private on your own computer until YOU want to share or collaborate. Your private data never leaves your computer unless you want it to. (It works by running a small server on your computer and you use your web browser to interact with it). Participants will download OpenRefine to their own devices and learn by doing.

  3. Z

    Data Cleaning, Translation & Split of the Dataset for the Automatic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Köhler, Juliane (2022). Data Cleaning, Translation & Split of the Dataset for the Automatic Classification of Documents for the Classification System for the Berliner Handreichungen zur Bibliotheks- und Informationswissenschaft [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6957841
    Explore at:
    Dataset updated
    Aug 8, 2022
    Dataset authored and provided by
    Köhler, Juliane
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cleaned_Dataset.csv – The combined CSV files of all scraped documents from DABI, e-LiS, o-bib and Springer.

    Data_Cleaning.ipynb – The Jupyter Notebook with python code for the analysis and cleaning of the original dataset.

    ger_train.csv – The German training set as CSV file.

    ger_validation.csv – The German validation set as CSV file.

    en_test.csv – The English test set as CSV file.

    en_train.csv – The English training set as CSV file.

    en_validation.csv – The English validation set as CSV file.

    splitting.py – The python code for splitting a dataset into train, test and validation set.

    DataSetTrans_de.csv – The final German dataset as a CSV file.

    DataSetTrans_en.csv – The final English dataset as a CSV file.

    translation.py – The python code for translating the cleaned dataset.

  4. e

    Data pre-processing and clean-up

    • paper.erudition.co.in
    html
    Updated Dec 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2023). Data pre-processing and clean-up [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Dec 2, 2023
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Data pre-processing and clean-up of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)

  5. O

    Ocean Clean Up Drones Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2025). Ocean Clean Up Drones Report [Dataset]. https://www.datainsightsmarket.com/reports/ocean-clean-up-drones-39141
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Mar 15, 2025
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The ocean cleanup drone market, currently valued at $4.7 billion in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 6.5% from 2025 to 2033. This expansion is driven by increasing concerns about marine pollution, stricter environmental regulations globally, and the limitations of traditional cleanup methods. The rising adoption of autonomous and remotely operated vehicles for cost-effective and efficient waste removal is a key factor propelling market growth. Technological advancements in drone capabilities, including improved battery life, sensor technology for waste identification and navigation, and increased payload capacity, are further enhancing the appeal of these solutions. The market is segmented by application (civil & commercial, military) and type (electric drive, solar drive). The civil and commercial segment currently dominates, driven by growing awareness and initiatives for coastal and open ocean cleanup. However, the military segment is expected to witness significant growth fueled by applications in mine detection and surveillance. North America and Europe currently hold significant market share, driven by strong environmental regulations and technological advancements, but the Asia-Pacific region is poised for substantial growth due to increasing industrialization and rising environmental concerns. Companies such as Serial Cleaners, RanMarine, Clean Sea Solutions, Clearbot, and Notilo Plus are leading the innovation and deployment of ocean cleanup drones. The market's restraints include high initial investment costs associated with drone acquisition and maintenance, technological limitations in addressing diverse types of marine debris, and challenges related to effective data management and analysis from drone operations. However, ongoing research and development efforts focusing on improved drone design, advanced sensor technologies, and the development of sophisticated data processing platforms are mitigating these challenges. Future market growth will depend on sustained government investment in marine pollution control, the development of cost-effective and scalable drone technologies, and successful collaborations between private companies, research institutions, and government agencies to streamline cleanup operations. The integration of artificial intelligence (AI) and machine learning (ML) for improved waste detection and sorting is a key emerging trend expected to significantly impact the market's trajectory.

  6. d

    Understanding resilience attributes for children, youth, and communities in...

    • search.dataone.org
    • data.griidc.org
    Updated Feb 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Beedasy, Jaishree (2025). Understanding resilience attributes for children, youth, and communities in the wake of the Deepwater Horizon oil spill study, social media component [Dataset]. http://doi.org/10.7266/n7-x639-n053
    Explore at:
    Dataset updated
    Feb 5, 2025
    Dataset provided by
    GRIIDC
    Authors
    Beedasy, Jaishree
    Description

    Twitter data was acquired from a service provider to investigate the role of social media during and after the Deepwater Horizon Oil Spill. In particular, historical Twitter data was accessed using a geospatial query tool. The query rules were built with a set of keywords and filtered by date. Using a combination of human coding and machine-learning processes, the twitter datasets were examined to get insights into online communications related to the oil spill. This dataset contains the description about the data acquisition process, the search strategy and keywords, and the dates of the historical Twitter datasets related to the Deepwater Horizon oil spill, along with a detailed summary of the methodology being used for data analysis.

  7. Superfund Training/Tech Transfer

    • catalog.data.gov
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Land and Emergency Management (OLEM) - Office of Superfund Remediation and Technology Innovation (OSRTI) (Owner) (2025). Superfund Training/Tech Transfer [Dataset]. https://catalog.data.gov/dataset/superfund-training-tech-transfer13
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This asset includes a collection of information resources, training, and other media related to hazardous waste site cleanup and characterization. A major part of this asset is the CLU-IN System, which is a collection of websites designed to be the central reference library for ""the development, collection, evaluation, coordination, and dissemination of information relating to the utilization of alternative or innovative treatment technologies..."" for cleaning up hazardous waste sites (Title 42 Section 9660 (b)(8)). Information includes Best Practices for using innovative technologies, case studies and focus areas about characterization and remediation technologies, emerging issues, optimization, and green(ing) remediation. CLU-IN is available via web-based documentation, live events, podcasts, and videos. Additionally, the Technology Innovation and Field Services Division (TIFSD) supports both classroom and online training registration through Trainex.org. All EPA content is also posted on EPA's website.

  8. D

    Data Science Platform Industry Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Science Platform Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/data-science-platform-industry-12961
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Science Platform market is experiencing robust growth, projected to reach $10.15 billion in 2025 and exhibiting a Compound Annual Growth Rate (CAGR) of 23.50% from 2025 to 2033. This expansion is driven by several key factors. The increasing availability and affordability of cloud computing resources are lowering the barrier to entry for organizations of all sizes seeking to leverage data science capabilities. Furthermore, the growing volume and complexity of data generated across various industries necessitates sophisticated platforms for efficient data processing, analysis, and model deployment. The rise of AI and machine learning further fuels demand, as organizations strive to gain competitive advantages through data-driven insights and automation. Strong demand from sectors like IT and Telecom, BFSI (Banking, Financial Services, and Insurance), and Retail & E-commerce are major contributors to market growth. The preference for cloud-based deployment models over on-premise solutions is also accelerating market expansion, driven by scalability, cost-effectiveness, and accessibility. Market segmentation reveals a diverse landscape. While large enterprises are currently major consumers, the increasing adoption of data science by small and medium-sized enterprises (SMEs) represents a significant growth opportunity. The platform offering segment is anticipated to maintain a substantial market share, driven by the need for comprehensive tools that integrate data ingestion, processing, modeling, and deployment capabilities. Geographically, North America and Europe are currently leading the market, but the Asia-Pacific region, particularly China and India, is poised for significant growth due to expanding digital economies and increasing investments in data science initiatives. Competitive intensity is high, with established players like IBM, SAS, and Microsoft competing alongside innovative startups like DataRobot and Databricks. This competitive landscape fosters innovation and further accelerates market expansion. Recent developments include: November 2023 - Stagwell announced a partnership with Google Cloud and SADA, a Google Cloud premier partner, to develop generative AI (gen AI) marketing solutions that support Stagwell agencies, client partners, and product development within the Stagwell Marketing Cloud (SMC). The partnership will help in harnessing data analytics and insights by developing and training a proprietary Stagwell large language model (LLM) purpose-built for Stagwell clients, productizing data assets via APIs to create new digital experiences for brands, and multiplying the value of their first-party data ecosystems to drive new revenue streams using Vertex AI and open source-based models., May 2023 - IBM launched a new AI and data platform, watsonx, it is aimed at allowing businesses to accelerate advanced AI usage with trusted data, speed and governance. IBM also introduced GPU-as-a-service, which is designed to support AI intensive workloads, with an AI dashboard to measure, track and help report on cloud carbon emissions. With watsonx, IBM offers an AI development studio with access to IBMcurated and trained foundation models and open-source models, access to a data store to gather and clean up training and tune data,. Key drivers for this market are: Rapid Increase in Big Data, Emerging Promising Use Cases of Data Science and Machine Learning; Shift of Organizations Toward Data-intensive Approach and Decisions. Potential restraints include: Lack of Skillset in Workforce, Data Security and Reliability Concerns. Notable trends are: Small and Medium Enterprises to Witness Major Growth.

  9. The global Digital Cleaning market size will be USD XX million in 2024.

    • cognitivemarketresearch.com
    pdf,excel,csv,ppt
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cognitive Market Research (2024). The global Digital Cleaning market size will be USD XX million in 2024. [Dataset]. https://www.cognitivemarketresearch.com/digital-cleaning-market-report
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Nov 23, 2024
    Dataset authored and provided by
    Cognitive Market Research
    License

    https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy

    Time period covered
    2021 - 2033
    Area covered
    Global
    Description

    According to Cognitive Market Research, the global Digital Cleaning market size will be USD XX million in 2024. It will expand at a compound annual growth rate (CAGR) of 7.00% from 2024 to 2031.

    North America held the major market share for more than 40% of the global revenue with a market size of USD XX million in 2024 and will rise at the compound annual growth rate (CAGR) of 5.2% from 2024 to 2031.
    Europe accounted for a market share of over 30% of the global revenue with a market size of USD XX million.
    Asia Pacific held a market share of around 23% of the global revenue with a market size of USD XX million in 2024 and will rise at the compound annual growth rate (CAGR) of 9.0% from 2024 to 2031.
    Latin America had a market share of more than 5% of the global revenue with a market size of USD XX million in 2024 and will rise at a compound annual growth rate (CAGR) of 6.4% from the year 2024 to 2031.
    Middle East and Africa had a market share of around 2% of the global revenue and was estimated at a market size of USD XX million in 2024 and will rise at the compound annual growth rate (CAGR) of 6.7% from 2024 to 2031.
    The personal care category is the fastest growing segment of the Digital Cleaning industry
    

    Market Dynamics of Digital Cleaning Market

    Key Drivers for Digital Cleaning Market

    Increasing Adoption of Digital Transformation in Enterprises to Boost Market Growth

    The growing emphasis on digital transformation across various sectors drives demand for digital cleaning solutions. As organizations transition to more digital operations, data accumulation on servers, cloud storage, and devices increases exponentially. Digital cleaning solutions help maintain streamlined, efficient data management practices, optimizing storage and reducing redundant files that slow down operations. This need is particularly acute in large organizations, where data management issues can affect operational efficiency and cybersecurity. By investing in digital cleaning, enterprises not only improve performance but also enhance data security by identifying and removing outdated, sensitive information, reducing the risk of data breaches. For instance, Principle Cleaning Services partnered with Skyline Robotics to bring autonomous window-cleaning robots to London. The partnership will help complete the window cleaning up to three times faster than humans, and it will be a more effective alternative to humans

    Rising Concerns for Data Privacy and Security to Drive Market Growth

    With escalating concerns about data breaches and privacy violations, organizations prioritize data hygiene to maintain secure digital environments. Digital cleaning ensures that unnecessary or outdated data is systematically removed, minimizing vulnerabilities to unauthorized access. Especially with regulatory compliance pressures from laws like GDPR and CCPA, organizations must ensure data minimization principles are applied, retaining only what is necessary. Digital cleaning solutions support compliance by automating data management tasks, helping organizations stay aligned with privacy requirements. This focus on digital hygiene enhances trust among consumers and stakeholders, strengthening an organization’s reputation and regulatory standing.

    Restraint Factor for the Digital Cleaning Market

    Compatibility and Device Fragmentation Will Limit Market Growth

    Digital cleaning solutions must be compatible with a broad range of operating systems, devices, and software versions. With the ever-increasing variety of devices and software environments, maintaining compatibility becomes a challenge. Device fragmentation, mainly within Android systems, can lead to inconsistent performance of digital cleaning tools, which may not work seamlessly across all devices. This inconsistency reduces the effectiveness of digital cleaning solutions and can deter users who face performance or compatibility issues. Additionally, the need for regular updates to keep pace with new operating system versions adds to the operational costs, potentially restraining the market's growth.

    Impact of Covid-19 on the Digital Cleaning Market

    The COVID-19 pandemic accelerated the demand for digital cleaning solutions as remote work, online education, and increased digital activity led to heightened use of personal and professional devices. With a surge in data consumption and storage needs, digital cleaning tools became essential for m...

  10. Data from: Remediation Sites

    • data.gis.ny.gov
    Updated Apr 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York State Department of Environmental Conservation (2021). Remediation Sites [Dataset]. https://data.gis.ny.gov/datasets/82794cfd9d6b4baea67e1efc1eac5371
    Explore at:
    Dataset updated
    Apr 2, 2021
    Dataset authored and provided by
    New York State Department of Environmental Conservationhttp://www.dec.ny.gov/
    Area covered
    Description

    Service layer is updated daily.For more information or to download layer see https://gis.ny.gov/gisdata/inventories/details.cfm?DSID=1097Download the metadata to learn more information about how the data was created and details about the attributes. Use the links within the metadata document to expand the sections of interest. http://gis.ny.gov/gisdata/metadata/nysdec.remedsite_borders_export.htmlThese are sites of environmental cleanup and safe brownfield redevelopment. DEC's remediation and enforcement programs ensure the timely and efficient cleanup and redevelopment of contaminated properties.

  11. Cleanups In My Community (CIMC) - Base Realignment and Closure (BRAC)...

    • s.cnmilf.com
    • catalog.data.gov
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Environmental Information (Point of Contact) (2025). Cleanups In My Community (CIMC) - Base Realignment and Closure (BRAC) Superfund Sites, National Layer [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/cleanups-in-my-community-cimc-base-realignment-and-closure-brac-superfund-sites-national-layer11
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    This data layer provides access to Base Realignment and Closure (BRAC) Superfund Sites as part of the CIMC web service. EPA works with DoD to facilitate the reuse and redevelopment of BRAC federal properties. When the BRAC program began in the early 1990s, EPA worked with DoD and the states to identify uncontaminated areas and these parcels were immediately made available for reuse. Since then EPA has worked with DoD to clean up the contaminated portions of bases. These are usually parcels that were training ranges, landfills, maintenance facilities and other past waste-disposal areas. Superfund is a program administered by the EPA to locate, investigate, and clean up the worst hazardous waste sites throughout the United States. EPA administers the Superfund program in cooperation with individual states and tribal governments. These sites include abandoned warehouses, manufacturing facilities, processing plants, and landfills - the key word here being abandoned. This data layer shows Superfund Sites that are located at BRAC Federal Facilities. Additional Superfund sites and other BRAC sites (those that are not Superfund sites) are included in other data layers as part of this web service. BRAC Superfund Sites shown in this web service are derived from the epa.gov website and include links to the relevant web pages within the attribute table. Data about BRAC Superfund Sites are located on their own EPA web pages, and CIMC links to those pages. The CIMC web service was initially published in 2013, but the data are updated twice a month. The full schedule for data updates in CIMC is located here: https://ofmpub.epa.gov/frs_public2/frs_html_public_pages.frs_refresh_stats.

  12. d

    Geolytica POIData.xyz Points of Interest (POI) Geo Data - Ireland

    • datarade.ai
    .csv
    Updated Jun 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geolytica (2021). Geolytica POIData.xyz Points of Interest (POI) Geo Data - Ireland [Dataset]. https://datarade.ai/data-products/geolytica-poidata-xyz-points-of-interest-poi-geo-data-ire-geolytica
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jun 2, 2021
    Dataset authored and provided by
    Geolytica
    Area covered
    Ireland, Ireland
    Description

    Point-of-interest (POI) is defined as a physical entity (such as a business) in a geo location (point) which may be (of interest).

    We strive to provide the most accurate, complete and up to date point of interest datasets for all countries of the world. The Republic of Ireland POI Dataset is one of our worldwide POI datasets with over 98% coverage.

    This is our process flow:

    Our machine learning systems continuously crawl for new POI data
    Our geoparsing and geocoding calculates their geo locations
    Our categorization systems cleanup and standardize the datasets
    Our data pipeline API publishes the datasets on our data store
    

    POI Data is in a constant flux - especially so during times of drastic change such as the Covid-19 pandemic.

    Every minute worldwide on an average day over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist.

    In today's interconnected world, of the approximately 200 million POIs worldwide, over 94% have a public online presence. As a new POI comes into existence its information will appear very quickly in location based social networks (LBSNs), other social media, pictures, websites, blogs, press releases. Soon after that, our state-of-the-art POI Information retrieval system will pick it up.

    We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via a recurring payment plan on our data update pipeline.

    The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

  13. i

    Public Expenditure Tracking Survey in Education 2006 - Madagascar

    • catalog.ihsn.org
    • dev.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). Public Expenditure Tracking Survey in Education 2006 - Madagascar [Dataset]. https://catalog.ihsn.org/catalog/859
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    World Bank
    UNICEF
    Ministere de L’education Nationale et de la Recherche Scientifique
    Time period covered
    2006
    Area covered
    Madagascar
    Description

    Abstract

    Madagascar had low school enrollment rates: only 60% of the urban children and 12% of the rural children completed primary school (World Bank, 2002). To improve the enrollment and completion rates as well as the quality of education, Madagascar government had substantially increased investments in the education sector. It committed itself to the Education For All (EFA) initiative and started to fully subsidize the tuition fees through the so-called "caisse ecole," and to provide school kits for all students in public primary schools. The Government also raised the districts' budgets for school material and started distributing free textbooks to schools.

    This study investigated the different resource flows in the financing of the public primary education sector in Madagascar.

    The survey was conducted in two rounds. The first round was carried out in October-November 2006 and the second round in April-May 2007. The study was implemented using stratified random sampling. Data from more than 200 schools in 28 districts was analyzed.

    Public Expenditure Tracking Survey among Madagascar health care facilities and workers was conducted at the same time with PETS in Education.

    Geographic coverage

    Provinces: Antananarivo, Fianarantsoa, Toamasina, Mahajanga, Toliara and Antsiranana.

    Analysis unit

    • Cisco (Circonscription Scolaire/District Education Facility);
    • Schools;
    • School Principals;
    • Teachers.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The study was conducted using stratified random sampling.

    The stratified sample was set up in such a way to be representative at the national level. Madagascar has 22 regions and 111 districts, and at least one district was visited in each region. Two districts were selected in the six largest regions. Hence, 28 districts were visited in total. The selected districts were obtained through random selection, giving greater (less) weight to districts with more (less) public primary schools within the district. In each district, three communes were randomly selected, giving greater weight to the communes with more schools. Within each commune, three public primary schools were randomly selected. By ranking schools from large to small and ensuring that a school was picked out of each tercile, a representative sample of school sizes was chosen.

    In the First Round, 252 schools were visited. Six percent of the visited schools were closed at the time of the survey and researchers ended up with reliable data on 238 schools.

    In the province of Antananarivo 54 schools were visited, 63 schools were visited in Fianarantsoa, 36 - in Toamasina, 45 - in Mahajanga, 36 - in Toliara and 18 - in Antsiranana.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The following survey instruments are available:

    • Enquête Au Niveau Des Établissements Scolaires, Enquete Cisco;
    • Enquête Au Niveau Des Établissements Scolaires, Enquete Directeur Ecole, Visite 1er Jour;
    • Enquête Au Niveau Des Établissements Scolaires, Enquete Directeur Ecole, Visite 2ème Jour;
    • Enquête Au Niveau Des Établissements Scolaires, Enquete Enseignant.

    Cleaning operations

    Detailed information about data editing procedures is available in "Data Cleaning Guide for PETS/QSDS Surveys" in external resources.

    STATA cleaning do-files and data quality reports can also be found in external resources.

  14. PromptCloud Web Scraping Data - Custom Web Scraping & Data Extraction...

    • datarade.ai
    .json, .xml, .csv
    Updated Nov 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2023). PromptCloud Web Scraping Data - Custom Web Scraping & Data Extraction Solutions, Globally | Scrape Web Data | Sample Datasets Available | PromptCloud [Dataset]. https://datarade.ai/data-products/sfdbsdfxsfdbsdfxsfdbsdfxsfdbsdfxsfdbsdfxsfdbsdfxsfdbsdfx-promptcloud
    Explore at:
    .json, .xml, .csvAvailable download formats
    Dataset updated
    Nov 27, 2023
    Dataset authored and provided by
    PromptCloud
    Area covered
    Belarus, Korea (Republic of), Mexico, New Caledonia, Tokelau, Sierra Leone, Wallis and Futuna, Germany, Marshall Islands, Sint Maarten (Dutch part)
    Description

    We help organizations to scrape data from the websites the way they need it for gathering large datasets with our customized enterprise web scraping services. Empower your business with clean, tagged, and structured data - by extracting it from the web to analyze text, codes, Images, URLs, and much more from your choice of websites.

    Powered by AI and Machine Learning, we execute multiple concurrent volume data extractions with faster scraping speeds. Our volume scraper supports even dynamic web page scraping - Infinite scrolling, dropdowns, log-in authentication, and AJAX, to name a few.

    We are committed to putting data at the heart of your business. Reach out for a no-frills PromptCloud experience- professional, technologically ahead and reliable.

  15. Delivery of air cleaning units until 11 March 2022

    • s3.amazonaws.com
    • gov.uk
    Updated Mar 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2022). Delivery of air cleaning units until 11 March 2022 [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/179/1795428.html
    Explore at:
    Dataset updated
    Mar 17, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Education
    Description

    This transparency release sets out the number of air cleaning units delivered by the Department for Education to state-funded education settings up until 11 March 2022.

    The data shows the cumulative number of air cleaning units delivered using administrative data from its delivery partners. The data covers education settings in England and includes early years, schools and further education providers.

    You can also view statistics on the following:

  16. m

    ResDerainNet

    • data.mendeley.com
    Updated Dec 29, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Takuro Matsui (2018). ResDerainNet [Dataset]. http://doi.org/10.17632/548vtzjbyf.1
    Explore at:
    Dataset updated
    Dec 29, 2018
    Authors
    Takuro Matsui
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Most outdoor vision systems can be influenced by rainy weather conditions. We present a single-image rain removal method, called ResDerainNet.The proposed network can automatically detect rain streaks and remove them. Based on the deep convolutional neural networks (CNN), we learn the mapping relationship between rainy and residual images from data. Furthermore, for training, we synthesize rainy images considering various rain models. Specifically, we mainly focus on the composite models as well as orientations and scales of rain streaks. In summary, we make following contributions; - A residual deep network is introduced to remove rain noise. Unlike the plane deep network which learns the mapping relationship between noisy and clean images, we learn the relationship between rainy and residual images from data. This speeds up the training process and improves the de-raining performance.

    • An automatic rain noise generator is introduced to obtain synthetic rain noise. Most de-raining methods create rain noise by using Photoshop. Since synthetic rain noise has many parameters, it is difficult to automatically adjust these parameters. In our method, we can easily change some parameters on MATLAB, which saves time and effort to get natural rain noise.

    • A combination of linear additive composite model and screen blend model is proposed to make synthetic rainy images. In order for the training network to be applicable to a wide range of rainy images, only one composite model is not enough. Our experimental results show that a combination of these models achieves better performance than using either model.

  17. w

    Multiple Indicator Cluster Survey 2000 - Viet Nam

    • microdata.worldbank.org
    • catalog.ihsn.org
    • +1more
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    General Statistics Office (2023). Multiple Indicator Cluster Survey 2000 - Viet Nam [Dataset]. https://microdata.worldbank.org/index.php/catalog/722
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    General Statistics Office
    Time period covered
    2000
    Area covered
    Vietnam
    Description

    Abstract

    The Viet Nam Multiple Indicator Cluster Survey (MICS) was carried by General Statistics Office of Viet Nam (GSO) in collaboration with Viet Nam Committee for Population, Family and Children (VCPFC). Financial and technical support by the United Nations Children's Fund (UNICEF).

    In the World Summit for children held in New York in 1990, the Government of Vietnam committed itself to the implementation of the World Declaration and Plan of Action for children.

    In implementation of directive 34/1999/CT-TTg on 27 December 1999 on promoting the implementation of the end-decade goals for children, reviewing the National Plan of Action for children, 1991-2000 and designing the National Plan of Action for children, 2001-2010, in the framework of the “Development of Social Indicators” project, the General Statistical Office (GSO) has chaired and coordinated with the Viet Nam Committee for the Protection and Care for Children (CPCC) to conduct the survey evaluating the end- decade goals for children, 1991-2000 (MICS). MICS has covered a sample size of 7628 households in 240 communes and wards representing the whole country, the urban area, the rural area and the 8 geographical areas in 61 towns/provinces. Field activities to collect data lasted 2 months, May- June/2000. The survey was technically supported by statisticians from EAPRO, UNICEF regional offices, UNICEF Hanoi on sample and questionnaire designing, data input software, not least the software analyzing and calculating the estimates generalizing the results of survey.

    Survey Objectives: The end-decade survey on children is aimed at. · Providing up-to-date and reliable data to analyse the situation of children and women in 2000. · Providing data to assess the implementation of the World summit goals for children and of the National Plan of Action for Vietnamese Children, 1991-2000. · Serving as a basis (with baseline data and information) for development of the National Plan of Action for Children, 2001-2010. · Building professional capacity in monitoring, managing and evaluating all the goals of child protection, care and education at all levels.

    Geographic coverage

    The 2000 MICS of Vietnam was a nationally representative sample survey.

    Analysis unit

    Households, Women, Child.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for the Viet Nam Multiple Indicator Cluster Survey (MICSII) was designed to provide reliable estimates on a large number of indicators on the situation of children and women at the national level, for urban and rural areas, and for 8 regions: Red River Delta, North West, North East, North Central Coast, South Central Coast, Central Highlands, South East, and Mekong River Delta. Regions were identified as the main sampling domains and the sample was selected in two stages: At the first stage, 240 EAs are sellected. After a household listing was carried out within the selected enumeration areas, a systematic sample of 1/3 of households in each EA was drawn. The survey managed to visit all of 240 selected EAs during the fieldwork period. The sample was stratified by region and is not self-weighting. For reporting national level results, sample weights are used.

    Sampling deviation

    No major deviations from the original sample design were made. All sample enumeration areas were accessed and successfully interviewed with good response rates.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaires for MICS in Vietnam are based on the New York UNICEF module questionnaires with some modifications and additions to fit in with Vietnam's context and to evaluate the goals set out in the National Plan of Action. The questionnaires have been arranged in such a way as to prevent the loss of questionnaire sheets and to facilitate the logic control between the items in the modules. Questionnaires include 3 sections. Section 1: general questions to be administered to families and family members. Section 2: questions for child bearing-age women (aged 15-49). Section 3: for children under 5.

    Section 1: Household questionnaire Part A: Household information panel Part B: Household listing form Part C: Education Part D: Child labour Part E: Maternal mortality Part F: Water and sanitation Part G: Salt iodization

    Section 2: Questionnaire for child bearing-age women Part A: Child mortality Part B: Tetanus toxoid (TT) Part C: Maternal and newborn health Part D: Contraceptive use Part E: HIV/AIDS

    Section 3: Questionnaire for children under five Part A:Birth registration and early learning Part B: Vitamin A Part C: Breastfeeding Part D: Care of illness Part E: Malaria Part F: Immunization Part G: Anthropometry

    Apart from the questionnaires to collect information at family level, questionnaires are also designed to gather information at community level supplementary to some indicators that can not have data collected at family level. The information garnered includes local population, socio-economic and physical conditions, education, health and progress of projects/plans of actions for children.

    Cleaning operations

    To minimize the errors made by data entry staff members, all the records were double- entered by two different members. Any error detected between the two entries was re-checked to find out which one is wrong. Data cleaning started in to early September. This process was closely observed to ensure the accuracy, quality and practicality of all the data collected.

    To minimize the errors due to wrong statements of respondents or wrong registration by interviewers, a cleaning programme was used to check the consistency and logic in the items of questionnaires and between the questionnaires. The cleaning programme printed out all the errors, then questionnaires were checked by qualified officials.

    Response rate

    8356 households were selected for the sample. Of these all were found to be occupied households and 8355 were successfully interviewed for a response rate of 100%. Within these households, 10063 eligible women aged 15-49 were identified for interview, of which 9473 were successfully interviewed (response rate 94.1%), and 2707 children aged 0-4 were identified for whom the mother or caretaker was successfully interviewed for 2680 children (response rate 99%).

    Sampling error estimates

    Estimates from a sample survey are affected by two types of errors: 1) non-sampling errors and 2) sampling errors. Non-sampling errors are the results of mistakes made in the implementation of data collection and data processing. Numerous efforts were made during implementation of the MICS - 3 to minimize this type of error, however, non-sampling errors are impossible to avoid and difficult to evaluate statistically.

    Sampling errors can be evaluated statistically. The sample of respondents to the MICS - 3 is only one of many possible samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that different somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability in the results of the survey between all possible samples, and, although, the degree of variability is not known exactly, it can be estimated from the survey results. The sampling errors are measured in terms of the standard error for a particular statistic (mean or percentage), which is the square root of the variance. Confidence intervals are calculated for each statistic within which the true value for the population can be assumed to fall. Plus or minus two standard errors of the statistic is used for key statistics presented in MICS, equivalent to a 95 percent confidence interval.

    If the sample of respondents had been a simple random sample, it would have been possible to use straightforward formulae for calculating sampling errors. However, the MICS - 3 sample is the result of a two-stage stratified design, and consequently needs to use more complex formulae. The SPSS complex samples module has been used to calculate sampling errors for the MICS - 3. This module uses the Taylor linearization method of variance estimation for survey estimates that are means or proportions. This method is documented in the SPSS file CSDescriptives.pdf found under the Help, Algorithms options in SPSS.

    Sampling errors have been calculated for a select set of statistics (all of which are proportions due to the limitations of the Taylor linearization method) for the national sample, urban and rural areas, and for each of the five regions. For each statistic, the estimate, its standard error, the coefficient of variation (or relative error -- the ratio between the standard error and the estimate), the design effect, and the square root design effect (DEFT -- the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used), as well as the 95 percent confidence intervals (+/-2 standard errors).

    Data appraisal

    A series of data quality tables and graphs are available to review the quality of the data and include the following:

    Age distribution of the household population Age distribution of eligible women and interviewed women Age distribution of eligible children and children for whom the mother or caretaker was interviewed Age distribution of children under age 5 by 3 month groups Age and period ratios at boundaries of eligibility Percent of observations with missing information on selected variables Presence of mother in

  18. f

    Hyperemesis Gravidarum related discourse on X (formerly Twitter) scraped raw...

    • figshare.com
    xlsx
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Corinne Berger; Raymond J Spiteri; Tamar Gur; Therese Rajasekera (2024). Hyperemesis Gravidarum related discourse on X (formerly Twitter) scraped raw data before cleanup. [Dataset]. http://doi.org/10.6084/m9.figshare.25570194.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    figshare
    Authors
    Corinne Berger; Raymond J Spiteri; Tamar Gur; Therese Rajasekera
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Hyperemesis Gravidarum (HG) is a severe form of morning sickness affecting pregnant women. Despite affecting up to 3% of women, HG appears to remain largely misunderstood by the public and healthcare professionals. This lack of understanding can lead to misdiagnosis, inadequate treatment, and undue suffering for individuals dealing with HG.This study explores the online discourse surrounding HG to identify the main themes in online discussions about HG. Using the Twint0 application, we collected 5,856 relevant posts from the X social networking site over a 12-month timeframe. A six-step thematic analysis of the posts yielded four main themes in the discourse surrounding HG: (1) the recognition and severity of symptoms, (2) the impact on pregnancy and maternal health, (3) experiences with healthcare and treatment, and (4) the emotional and psychological toll on pregnant women. The study underscores the urgent need for improved awareness and education about HG among both the public and healthcare professionals. Negative experiences in healthcare indicate systemic issues in HG diagnosis and treatment, emphasizing the importance of patient-centered care and empathetic approaches. Additionally, addressing the significant emotional toll of HG on pregnant women is essential for implementing comprehensive support mechanisms.

  19. Delivery of air cleaning units and CO2 monitors until 24 June 2022

    • gov.uk
    • sasastunts.com
    Updated Jun 30, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2022). Delivery of air cleaning units and CO2 monitors until 24 June 2022 [Dataset]. https://www.gov.uk/government/publications/delivery-of-air-cleaning-units-and-co2-monitors-until-24-june-2022
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Education
    Description

    This transparency release sets out the number of air cleaning units and CO2 monitors delivered by the Department for Education to state-funded education settings up until 24 June 2022.

    The data shows the cumulative number of air cleaning units and CO2 monitors delivered using administrative data from its delivery partners. The data covers education settings in England and includes early years, schools and further education providers.

    You can also view statistics on the following:

  20. a

    NC Clean Marinas

    • fisheries-ncdenr.opendata.arcgis.com
    • data-ncdenr.opendata.arcgis.com
    • +1more
    Updated Jun 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NC Dept. of Environmental Quality (2019). NC Clean Marinas [Dataset]. https://fisheries-ncdenr.opendata.arcgis.com/datasets/nc-clean-marinas
    Explore at:
    Dataset updated
    Jun 5, 2019
    Dataset authored and provided by
    NC Dept. of Environmental Quality
    Area covered
    Description

    Clean Marina is a nationwide program developed by the National Marine Environmental Education Foundation, a nonprofit organization that works to clean up waterways for better recreational boating. The foundation encourages states to adapt Clean Marina principles to fit their own needs. North Carolina joins South Carolina, Florida and Maryland as states with Clean Marina programs in place.The N.C. Clean Marina program is a partnership between N.C. Boating Industry Services, the N.C. Marine Trade Association, the Division of Coastal Management, the Albemarle-Pamlico National Estuary Program, N.C. Sea Grant, the U.S. Power Squadron, and U.S. Coast Guard Auxiliary.The North Carolina Clean Marina Program is a voluntary program that began in the summer of 2000. Marina operators who choose to participate must complete an evaluation form about their use of specific best management practices. The program is designed to show that marina operators can help safeguard the environment by using management and operations techniques that go above and beyond regulatory requirements.Attributes:Name: name of marinaAddress: physical address of marinaPhone: phone number of marinaWebsite: website for more information about the marinaLattitude_DD: marina location for water navigation in longitude (decimal degrees)Longitude_DD: marina location for water navigation in longitude (decimal degrees)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:
141 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Search
Clear search
Close search
Google apps
Main menu