100+ datasets found
  1. h

    twt-kaggle-data

    • huggingface.co
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    megha manoj (2023). twt-kaggle-data [Dataset]. https://huggingface.co/datasets/mochi-skz/twt-kaggle-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 8, 2023
    Authors
    megha manoj
    Description

    mochi-skz/twt-kaggle-data dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. Industrial Dataset

    • kaggle.com
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Be Schue (2023). Industrial Dataset [Dataset]. https://www.kaggle.com/datasets/beschue/industrial-classification-data-set
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 8, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Be Schue
    Description

    The dataset includes 10 object categories from the MVTEC INDUSTRIAL 3D OBJECT DETECTION DATASET as input CAD objects. The selected objects include a diverse range of industrial products:

    S.NoObject Class
    1adapter plate triangular
    2bracket big
    3clamp small
    4engine part cooler round
    5engine part cooler square
    6injection pump
    7screw
    8star
    9tee connector
    10thread

    The dataset contains a total of 100,000 RGB images of each object category, divided into three sets: 70,000 for training, 20,000 for testing, and 10,000 for validation. Each image has a resolution of 224 x 224 and is in JPEG format.

    To ensure the suitability of our dataset for various computer vision tasks, we included not only the class labels but also generated bounding boxes and semantic masks for each image, which are stored in COCO annotation format. Each image contains one instance of the ten selected objects.

    Throughout the 10,000 images for each class, we randomly varied the position of the object in x-y-z direction and the object’s rotation to provide a diverse range of images. Additionally, we changed the object’s surface to a smooth metallic texture, imitating real industrial components. Lastly, we varied the lighting conditions within each image, including the position of the light sources, their energy, and emission strength.

    Find out more about our Data Generation Tool:

    Schuerrle, B., Sankarappan, V., & Morozov, A. (2023). SynthiCAD: Generation of Industrial Image Data Sets for Resilience Evaluation of Safety-Critical Classifiers. In Proceeding of the 33rd European Safety and Reliability Conference. 33rd European Safety and Reliability Conference. Research Publishing Services. https://doi.org/10.3850/978-981-18-8071-1_p400-cd

  3. Iranian telecom company churn

    • kaggle.com
    Updated Sep 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Button (2024). Iranian telecom company churn [Dataset]. https://www.kaggle.com/datasets/tombutton/iranian-telecom-company-churn/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 30, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tom Button
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is randomly collected from an Iranian telecom company's database over a period of 12 months. Sourced from the UC Irvine Machine Learning Repository: https://archive.ics.uci.edu/

  4. Sales Dataset of Different Regions

    • kaggle.com
    Updated Jan 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yamin Hossain (2025). Sales Dataset of Different Regions [Dataset]. https://www.kaggle.com/datasets/yaminh/sales-dataset-of-different-regions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yamin Hossain
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Here is the updated list with web_events.csv included:

    1. Orders Dataset:

      • Contains details of customer orders, including order dates, total order amounts (in USD), and associated customer IDs.
      • Helps analyze revenue trends, customer purchasing behavior, and seasonal patterns.
    2. Accounts Dataset:

      • Represents customer account information such as account IDs, names, and sales representative assignments.
      • Useful for understanding customer demographics and their engagement with the company.
    3. Regions Dataset:

      • Defines geographical regions managed by sales representatives, including region names and IDs.
      • Enables regional sales performance analysis and comparative insights across regions.
    4. Sales Representatives Dataset:

      • Includes data on sales representatives, their IDs, names, and the regions they serve.
      • Helps link customer accounts to specific regions and analyze rep performance.
    5. Web Events Dataset:

      • Logs customer interactions on the company's website, including timestamps, event types, and user IDs.
      • Helps track user activity, identify trends in website engagement, and optimize the online user experience.

    These datasets collectively enable comprehensive insights into sales performance, customer behavior, website engagement, and regional trends, forming the backbone of the interactive dashboard.

    Dashboard Link:

    Click Here To Visit Dashboard 📊

  5. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  6. O

    BUTTER - Empirical Deep Learning Dataset

    • data.openei.org
    • datasets.ai
    • +2more
    code, data, website
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
    Explore at:
    code, website, dataAvailable download formats
    Dataset updated
    May 20, 2022
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    National Renewable Energy Laboratory
    Authors
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

  7. h

    Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS

    • healthdatagateway.org
    unknown
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2023). Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS [Dataset]. https://healthdatagateway.org/dataset/138
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 9, 2023
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Background Acute compartment syndrome (ACS) is an emergency orthopaedic condition wherein a rapid rise in compartmental pressure compromises blood perfusion to the tissues leading to ischaemia and muscle necrosis. This serious condition is often misdiagnosed or associated with significant diagnostic delay, and can lead to limb amputations and death.

    The most common causes of ACS are high impact trauma, especially fractures of the lower limbs which account for 40% of ACS cases. ACS is a challenge to diagnose and treat effectively, with differing clinical thresholds being utilised which can result in unnecessary osteotomy. The highly granular synthetic data for over 900 patients with ACS provide the following key parameters to support critical research into this condition:

    1. Patient data (injury type, location, age, sex, pain levels, pre-injury status and comorbidities)
    2. Physiological parameters (intracompartmental pressure, pH, tissue oxygenation, compartment hardness)
    3. Muscle biomarkers (creatine kinase, myoglobin, lactate dehydrogenase)
    4. Blood vessel damage biomarkers (glycocalyx shedding markers, endothelial permeability markers)

    PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Acute compartment syndrome. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes highly granular patient demographics, physiological parameters, muscle biomarkers, blood biomarkers and co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings and admissions), presenting complaint, lab analysis results (eGFR, troponin, CRP, INR, ABG glucose), systolic and diastolic blood pressures, procedures and surgery details.

    Available supplementary data: ACS cohort, Matched controls; ambulance, OMOP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.

  8. d

    Idaho Groundwater Quality Dataset [Relational Database Table: SiteID]

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Idaho Groundwater Quality Dataset [Relational Database Table: SiteID] [Dataset]. https://catalog.data.gov/dataset/idaho-groundwater-quality-dataset-relational-database-table-siteid
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Idaho
    Description

    This dataset is a compilation of data obtained from the Idaho Department of Water Quality, the Idaho Department of Water Resources, and the Water Quality Portal. The 'SiteID' table catalogues organization-specific identification numbers assigned to each monitoring location.

  9. d

    Data from: Wind Integration National Dataset (WIND) Toolkit

    • catalog.data.gov
    • data.openei.org
    • +1more
    Updated Mar 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Wind Integration National Dataset (WIND) Toolkit [Dataset]. https://catalog.data.gov/dataset/wind-integration-national-dataset-wind-toolkit
    Explore at:
    Dataset updated
    Mar 13, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Description

    Wind resource data for North America was produced using the Weather Research and Forecasting Model (WRF). The WRF model was initialized with the European Centre for Medium Range Weather Forecasts Interim Reanalysis (ERA-Interm) data set with an initial grid spacing of 54 km. Three internal nested domains were used to refine the spatial resolution to 18, 6, and finally 2 km. The WRF model was run for years 2007 to 2014. While outputs were extracted from WRF at 5 minute time-steps, due to storage limitations instantaneous hourly time-step are provided for all variables while full 5 min resolution data is provided for wind speed and wind direction only. The following variables were extracted from the WRF model data: - Wind Speed at 10, 40, 60, 80, 100, 120, 140, 160, 200 m - Wind Direction at 10, 40, 60, 80, 100, 120, 140, 160, 200 m - Temperature at 2, 10, 40, 60, 80, 100, 120, 140, 160, 200 m - Pressure at 0, 100, 200 m - Surface Precipitation Rate - Surface Relative Humidity - Inverse Monin Obukhov Length

  10. d

    Enterprise Dataset Inventory

    • catalog.data.gov
    • opendata.dc.gov
    • +1more
    Updated Feb 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of the Chief Technology Officer (2025). Enterprise Dataset Inventory [Dataset]. https://catalog.data.gov/dataset/enterprise-dataset-inventory-062ac
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    Office of the Chief Technology Officer
    Description

    Mayor's Order 2017-115 establishes a comprehensive data policy for the District government. The data created and managed by the District government are valuable assets and are independent of the information systems in which the data reside. As such, the District government shall: maintain an inventory of its enterprise datasets; classify enterprise datasets by level of sensitivity; regularly publish the inventory, including the classifications, as an open dataset; and strategically plan and manage its investment in data.The greatest value from the District’s investment in data can only be realized when enterprise datasets are freely shared among District agencies, with federal and regional governments, and with the public to the fullest extent consistent with safety, privacy, and security. For more information, please visit https://opendata.dc.gov/pages/edi-overview. Previous years of EDI can be found on Open Data.

  11. o

    Industrial or commercial units

    • data.opendatascience.eu
    Updated Jan 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). ANV - Probability distribution for Quercus suber [Dataset]. https://data.opendatascience.eu/geonetwork/srv/search?type=dataset
    Explore at:
    Dataset updated
    Jan 2, 2021
    Description

    Overview: 121: Land units that are under industrial or commercial use or serve for public service facilities. Traceability (lineage): This dataset was produced with a machine learning framework with several input datasets, specified in detail in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ) Scientific methodology: The single-class probability layers were generated with a spatiotemporal ensemble machine learning framework detailed in Witjes et al., 2022 (in review, preprint available at https://doi.org/10.21203/rs.3.rs-561383/v3 ). The single-class uncertainty layers were calculated by taking the standard deviation of the three single-class probabilities predicted by the three components of the ensemble. The HCL (hard class) layers represents the class with the highest probability as predicted by the ensemble. Usability: The HCL layers have a decreasing average accuracy (weighted F1-score) at each subsequent level in the CLC hierarchy. These metrics are 0.83 at level 1 (5 classes):, 0.63 at level 2 (14 classes), and 0.49 at level 3 (43 classes). This means that the hard-class maps are more reliable when aggregating classes to a higher level in the hierarchy (e.g. 'Discontinuous Urban Fabric' and 'Continuous Urban Fabric' to 'Urban Fabric'). Some single-class probabilities may more closely represent actual patterns for some classes that were overshadowed by unequal sample point distributions. Users are encouraged to set their own thresholds when postprocessing these datasets to optimize the accuracy for their specific use case. Uncertainty quantification: Uncertainty is quantified by taking the standard deviation of the probabilities predicted by the three components of the spatiotemporal ensemble model. Data validation approaches: The LULC classification was validated through spatial 5-fold cross-validation as detailed in the accompanying publication. Completeness: The dataset has chunks of empty predictions in regions with complex coast lines (e.g. the Zeeland province in the Netherlands and the Mar da Palha bay area in Portugal). These are artifacts that will be avoided in subsequent versions of the LULC product. Consistency: The accuracy of the predictions was compared per year and per 30km*30km tile across europe to derive temporal and spatial consistency by calculating the standard deviation. The standard deviation of annual weighted F1-score was 0.135, while the standard deviation of weighted F1-score per tile was 0.150. This means the dataset is more consistent through time than through space: Predictions are notably less accurate along the Mediterrranean coast. The accompanying publication contains additional information and visualisations. Positional accuracy: The raster layers have a resolution of 30m, identical to that of the Landsat data cube used as input features for the machine learning framework that predicted it. Temporal accuracy: The dataset contains predictions and uncertainty layers for each year between 2000 and 2019. Thematic accuracy: The maps reproduce the Corine Land Cover classification system, a hierarchical legend that consists of 5 classes at the highest level, 14 classes at the second level, and 44 classes at the third level. Class 523: Oceans was omitted due to computational constraints.

  12. Intelligent Monitor

    • kaggle.com
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ptdevsecops (2024). Intelligent Monitor [Dataset]. http://doi.org/10.34740/kaggle/ds/4383210
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ptdevsecops
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    IntelligentMonitor: Empowering DevOps Environments With Advanced Monitoring and Observability aims to improve monitoring and observability in complex, distributed DevOps environments by leveraging machine learning and data analytics. This repository contains a sample implementation of the IntelligentMonitor system proposed in the research paper, presented and published as part of the 11th International Conference on Information Technology (ICIT 2023).

    If you use this dataset and code or any herein modified part of it in any publication, please cite these papers:

    P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

    For any questions and research queries - please reach out via Email.

    Abstract - In the dynamic field of software development, DevOps has become a critical tool for enhancing collaboration, streamlining processes, and accelerating delivery. However, monitoring and observability within DevOps environments pose significant challenges, often leading to delayed issue detection, inefficient troubleshooting, and compromised service quality. These issues stem from DevOps environments' complex and ever-changing nature, where traditional monitoring tools often fall short, creating blind spots that can conceal performance issues or system failures. This research addresses these challenges by proposing an innovative approach to improve monitoring and observability in DevOps environments. Our solution, Intelligent-Monitor, leverages realtime data collection, intelligent analytics, and automated anomaly detection powered by advanced technologies such as machine learning and artificial intelligence. The experimental results demonstrate that IntelligentMonitor effectively manages data overload, reduces alert fatigue, and improves system visibility, thereby enhancing performance and reliability. For instance, the average CPU usage across all components showed a decrease of 9.10%, indicating improved CPU efficiency. Similarly, memory utilization and network traffic showed an average increase of 7.33% and 0.49%, respectively, suggesting more efficient use of resources. By providing deep insights into system performance and facilitating rapid issue resolution, this research contributes to the DevOps community by offering a comprehensive solution to one of its most pressing challenges. This fosters more efficient, reliable, and resilient software development and delivery processes.

    Components The key components that would need to be implemented are:

    • Data Collection - Collect performance metrics and log data from the distributed system components. Could use technology like Kafka or telemetry libraries.
    • Data Processing - Preprocess and aggregate the collected data into an analyzable format. Could use Spark for distributed data processing.
    • Anomaly Detection - Apply machine learning algorithms to detect anomalies in the performance metrics. Could use isolation forest or LSTM models.
    • Alerting - Generate alerts when anomalies are detected. It could integrate with tools like PagerDuty.
    • Visualization - Create dashboards to visualize system health and key metrics. Could use Grafana or Kibana.
    • Data Storage - Store the collected metrics and log data. Could use Elasticsearch or InfluxDB.

    Implementation Details The core of the implementation would involve the following: - Setting up the data collection pipelines. - Building and training anomaly detection ML models on historical data. - Developing a real-time data processing pipeline. - Creating an alerting framework that ties into the ML models. - Building visualizations and dashboards.

    The code would need to handle scaled-out, distributed execution for production environments.

    Proper code documentation, logging, and testing would be added throughout the implementation.

    Usage Examples Usage examples could include:

    • Running the data collection agents on each system component.
    • Visualizing system metrics through Grafana dashboards.
    • Investigating anomalies detected by the ML models.
    • Tuning the alerting rules to minimize false positives.
    • Correlating metrics with log data to troubleshoot issues.

    References The implementation would follow the details provided in the original research paper: P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

    Any additional external libraries or sources used would be properly cited.

    Tags - DevOps, Software Development, Collaboration, Streamlini...

  13. Z

    Data from: Fruit Recognition dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhuliang Chen (2020). Fruit Recognition dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1310164
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zhuliang Chen
    Israr Hussain,
    Qianhua He
    Wei Xie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The database used in this study is comprising of 44406 fruit images, which we collected in a period of 6 months. The images where made with in our lab’s environment under different scenarios which we mention below. We captured all the images on a clear background with resolution of 320×258 pixels. We used HD Logitech web camera to took the pictures. During collecting this database, we created all kind of challenges, which, we have to face in real-world recognition scenarios in supermarket and fruit shops such as light, shadow, sunshine, pose variation, to make our model robust for, it might be necessary to cope with illumination variation, camera capturing artifacts, specular reflection shading and shadows. We tested our model’s robustness in all scenarios and it perform quit well.

  14. d

    Crash Reporting - Drivers Data

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +3more
    Updated May 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2025). Crash Reporting - Drivers Data [Dataset]. https://catalog.data.gov/dataset/crash-reporting-drivers-data
    Explore at:
    Dataset updated
    May 24, 2025
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    This dataset provides information on motor vehicle operators (drivers) involved in traffic collisions occurring on county and local roadways. The dataset reports details of all traffic collisions occurring on county and local roadways within Montgomery County, as collected via the Automated Crash Reporting System (ACRS) of the Maryland State Police, and reported by the Montgomery County Police, Gaithersburg Police, Rockville Police, or the Maryland-National Capital Park Police. This dataset shows each collision data recorded and the drivers involved. Please note that these collision reports are based on preliminary information supplied to the Police Department by the reporting parties. Therefore, the collision data available on this web page may reflect: -Information not yet verified by further investigation -Information that may include verified and unverified collision data -Preliminary collision classifications may be changed at a later date based upon further investigation -Information may include mechanical or human error This dataset can be joined with the other 2 Crash Reporting datasets (see URLs below) by the State Report Number. * Crash Reporting - Incidents Data at https://data.montgomerycountymd.gov/Public-Safety/Crash-Reporting-Incidents-Data/bhju-22kf * Crash Reporting - Non-Motorists Data at https://data.montgomerycountymd.gov/Public-Safety/Crash-Reporting-Non-Motorists-Data/n7fk-dce5 Update Frequency : Weekly

  15. N

    Dataset Removals

    • data.cityofnewyork.us
    • catalog.data.gov
    application/rdfxml +5
    Updated Jan 22, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Technology and Innovation (OTI) (2018). Dataset Removals [Dataset]. https://data.cityofnewyork.us/dataset/Dataset-Removals/tm5c-buy3
    Explore at:
    tsv, application/rdfxml, xml, json, application/rssxml, csvAvailable download formats
    Dataset updated
    Jan 22, 2018
    Dataset authored and provided by
    Office of Technology and Innovation (OTI)
    Description

    A listing of datasets removed from NYC Open Data and the reason for their removal.

  16. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

  17. CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine...

    • data.csiro.au
    • researchdata.edu.au
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li (2022). CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine learning ( Deep Learning ) [Dataset]. http://doi.org/10.25919/4v55-dn16
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2015 - Aug 31, 2022
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    ESA
    Description

    What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.

    This binary dataset contains chips labelled as: - "0" for chips not containing any oil features (look-alikes or clean seas)
    - "1" for those containing oil features.

    This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.

    Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.

    Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905

    Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)

  18. R

    Camer Data Dataset

    • universe.roboflow.com
    zip
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martino Pagano 2 (2023). Camer Data Dataset [Dataset]. https://universe.roboflow.com/martino-pagano-2/camer-data/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 26, 2023
    Dataset authored and provided by
    Martino Pagano 2
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Fod Bounding Boxes
    Description

    Camer Data

    ## Overview
    
    Camer Data is a dataset for object detection tasks - it contains Fod annotations for 831 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  19. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset authored and provided by
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  20. NeoVault platform and raw dataset

    • zenodo.org
    png
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janet Pigueiras del Real; Angel Ruiz Zafra; Lionel Cervera Gontard; Janet Pigueiras del Real; Angel Ruiz Zafra; Lionel Cervera Gontard (2025). NeoVault platform and raw dataset [Dataset]. http://doi.org/10.5281/zenodo.14864505
    Explore at:
    pngAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janet Pigueiras del Real; Angel Ruiz Zafra; Lionel Cervera Gontard; Janet Pigueiras del Real; Angel Ruiz Zafra; Lionel Cervera Gontard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Repository Description

    This repository provides access to NeoVault, a structured data hub for postural, physiological, and medical data of neonates. NeoVault offers both a web interface and a Raw dataset for retrieving movement datasets collected from preterm infants.

    The raw dataset includes positional data (x, y, z coordinates) and physiological parameters (heart rate, oxygen saturation) recorded from neonatal intensive care units (NICUs). These datasets are publicly available to support research in neonatal movement quantification and computational healthcare analysis.

    For more details on data access, please refer to the NeoVault documentation.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
megha manoj (2023). twt-kaggle-data [Dataset]. https://huggingface.co/datasets/mochi-skz/twt-kaggle-data

twt-kaggle-data

mochi-skz/twt-kaggle-data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2023
Authors
megha manoj
Description

mochi-skz/twt-kaggle-data dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu