100+ datasets found
  1. Intelligent Monitor

    • kaggle.com
    Updated Apr 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ptdevsecops (2024). Intelligent Monitor [Dataset]. http://doi.org/10.34740/kaggle/ds/4383210
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ptdevsecops
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    IntelligentMonitor: Empowering DevOps Environments With Advanced Monitoring and Observability aims to improve monitoring and observability in complex, distributed DevOps environments by leveraging machine learning and data analytics. This repository contains a sample implementation of the IntelligentMonitor system proposed in the research paper, presented and published as part of the 11th International Conference on Information Technology (ICIT 2023).

    If you use this dataset and code or any herein modified part of it in any publication, please cite these papers:

    P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

    For any questions and research queries - please reach out via Email.

    Abstract - In the dynamic field of software development, DevOps has become a critical tool for enhancing collaboration, streamlining processes, and accelerating delivery. However, monitoring and observability within DevOps environments pose significant challenges, often leading to delayed issue detection, inefficient troubleshooting, and compromised service quality. These issues stem from DevOps environments' complex and ever-changing nature, where traditional monitoring tools often fall short, creating blind spots that can conceal performance issues or system failures. This research addresses these challenges by proposing an innovative approach to improve monitoring and observability in DevOps environments. Our solution, Intelligent-Monitor, leverages realtime data collection, intelligent analytics, and automated anomaly detection powered by advanced technologies such as machine learning and artificial intelligence. The experimental results demonstrate that IntelligentMonitor effectively manages data overload, reduces alert fatigue, and improves system visibility, thereby enhancing performance and reliability. For instance, the average CPU usage across all components showed a decrease of 9.10%, indicating improved CPU efficiency. Similarly, memory utilization and network traffic showed an average increase of 7.33% and 0.49%, respectively, suggesting more efficient use of resources. By providing deep insights into system performance and facilitating rapid issue resolution, this research contributes to the DevOps community by offering a comprehensive solution to one of its most pressing challenges. This fosters more efficient, reliable, and resilient software development and delivery processes.

    Components The key components that would need to be implemented are:

    • Data Collection - Collect performance metrics and log data from the distributed system components. Could use technology like Kafka or telemetry libraries.
    • Data Processing - Preprocess and aggregate the collected data into an analyzable format. Could use Spark for distributed data processing.
    • Anomaly Detection - Apply machine learning algorithms to detect anomalies in the performance metrics. Could use isolation forest or LSTM models.
    • Alerting - Generate alerts when anomalies are detected. It could integrate with tools like PagerDuty.
    • Visualization - Create dashboards to visualize system health and key metrics. Could use Grafana or Kibana.
    • Data Storage - Store the collected metrics and log data. Could use Elasticsearch or InfluxDB.

    Implementation Details The core of the implementation would involve the following: - Setting up the data collection pipelines. - Building and training anomaly detection ML models on historical data. - Developing a real-time data processing pipeline. - Creating an alerting framework that ties into the ML models. - Building visualizations and dashboards.

    The code would need to handle scaled-out, distributed execution for production environments.

    Proper code documentation, logging, and testing would be added throughout the implementation.

    Usage Examples Usage examples could include:

    • Running the data collection agents on each system component.
    • Visualizing system metrics through Grafana dashboards.
    • Investigating anomalies detected by the ML models.
    • Tuning the alerting rules to minimize false positives.
    • Correlating metrics with log data to troubleshoot issues.

    References The implementation would follow the details provided in the original research paper: P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

    Any additional external libraries or sources used would be properly cited.

    Tags - DevOps, Software Development, Collaboration, Streamlini...

  2. h

    Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS

    • healthdatagateway.org
    unknown
    Updated Oct 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158) (2023). Synthetic dataset - Using data-driven ML towards improving diagnosis of ACS [Dataset]. https://healthdatagateway.org/dataset/138
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Oct 9, 2023
    Dataset authored and provided by
    This publication uses data from PIONEER, an ethically approved database and analytical environment (East Midlands Derby Research Ethics 20/EM/0158)
    License

    https://www.pioneerdatahub.co.uk/data/data-request-process/https://www.pioneerdatahub.co.uk/data/data-request-process/

    Description

    Background Acute compartment syndrome (ACS) is an emergency orthopaedic condition wherein a rapid rise in compartmental pressure compromises blood perfusion to the tissues leading to ischaemia and muscle necrosis. This serious condition is often misdiagnosed or associated with significant diagnostic delay, and can lead to limb amputations and death.

    The most common causes of ACS are high impact trauma, especially fractures of the lower limbs which account for 40% of ACS cases. ACS is a challenge to diagnose and treat effectively, with differing clinical thresholds being utilised which can result in unnecessary osteotomy. The highly granular synthetic data for over 900 patients with ACS provide the following key parameters to support critical research into this condition:

    1. Patient data (injury type, location, age, sex, pain levels, pre-injury status and comorbidities)
    2. Physiological parameters (intracompartmental pressure, pH, tissue oxygenation, compartment hardness)
    3. Muscle biomarkers (creatine kinase, myoglobin, lactate dehydrogenase)
    4. Blood vessel damage biomarkers (glycocalyx shedding markers, endothelial permeability markers)

    PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, 2750 beds & an expanded 250 ITU bed capacity during COVID. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health”.

    Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Acute compartment syndrome. Longitudinal & individually linked, so that the preceding & subsequent health journey can be mapped & healthcare utilisation prior to & after admission understood. The dataset includes highly granular patient demographics, physiological parameters, muscle biomarkers, blood biomarkers and co-morbidities taken from ICD-10 & SNOMED-CT codes. Serial, structured data pertaining to process of care (timings and admissions), presenting complaint, lab analysis results (eGFR, troponin, CRP, INR, ABG glucose), systolic and diastolic blood pressures, procedures and surgery details.

    Available supplementary data: ACS cohort, Matched controls; ambulance, OMOP data. Available supplementary support: Analytics, Model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.

  3. O

    BUTTER - Empirical Deep Learning Dataset

    • data.openei.org
    • datasets.ai
    • +2more
    code, data, website
    Updated May 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek (2022). BUTTER - Empirical Deep Learning Dataset [Dataset]. http://doi.org/10.25984/1872441
    Explore at:
    code, website, dataAvailable download formats
    Dataset updated
    May 20, 2022
    Dataset provided by
    Open Energy Data Initiative (OEDI)
    USDOE Office of Energy Efficiency and Renewable Energy (EERE), Multiple Programs (EE)
    National Renewable Energy Laboratory
    Authors
    Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek; Charles Tripp; Jordan Perr-Sauer; Lucas Hayne; Monte Lunacek
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The BUTTER Empirical Deep Learning Dataset represents an empirical study of the deep learning phenomena on dense fully connected networks, scanning across thirteen datasets, eight network shapes, fourteen depths, twenty-three network sizes (number of trainable parameters), four learning rates, six minibatch sizes, four levels of label noise, and fourteen levels of L1 and L2 regularization each. Multiple repetitions (typically 30, sometimes 10) of each combination of hyperparameters were preformed, and statistics including training and test loss (using a 80% / 20% shuffled train-test split) are recorded at the end of each training epoch. In total, this dataset covers 178 thousand distinct hyperparameter settings ("experiments"), 3.55 million individual training runs (an average of 20 repetitions of each experiments), and a total of 13.3 billion training epochs (three thousand epochs were covered by most runs). Accumulating this dataset consumed 5,448.4 CPU core-years, 17.8 GPU-years, and 111.2 node-years.

  4. f

    Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  5. Z

    Data from: A Dataset and Machine Learning Approach to Classify and Augment...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lang, Florian (2023). A Dataset and Machine Learning Approach to Classify and Augment Interface Elements of Household Appliances to Support People with Visual Impairment [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7586106
    Explore at:
    Dataset updated
    Feb 4, 2023
    Dataset provided by
    Schmidt, Albrecht
    Machulla, Tonja-Katrin
    Lang, Florian
    Tschakert, Hanna
    Wieland, Markus
    Description

    Here, we provide a dataset of images of interfaces from household appliances, where all interface elements are labled with one of five different types of interface elements. Further, we provide auxillary materials to use and extend the dataset.

  6. Results of European Parliament elections

    • kaggle.com
    Updated Apr 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Parliament (2019). Results of European Parliament elections [Dataset]. https://www.kaggle.com/datasets/eu-parliament/results-of-european-parliament-elections
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 3, 2019
    Dataset provided by
    Kaggle
    Authors
    European Parliament
    Description

    Content

    More details about each file are in the individual file descriptions.

    Context

    This is a dataset from European Parliament hosted by the EU Open Data Portal. The Open Data Portal is found here and they update their information according the amount of data that is brought in. Explore European Parliament data using Kaggle and all of the data sources available through the European Parliament organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using the EU ODP API and Kaggle's API.

    This dataset is distributed under the following licenses: Dataset License

    Cover photo by Tamara Menzi on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  7. LinkedIn Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Dec 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2021). LinkedIn Datasets [Dataset]. https://brightdata.com/products/datasets/linkedin
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Dec 17, 2021
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features

    Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.

    Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases

    Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.

    Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.

  8. A

    ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘How Every NFL Team’s Fans Lean Politically?’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-how-every-nfl-teams-fans-lean-politically-550a/f911ccf2/?iid=003-030&v=presentation
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘How Every NFL Team’s Fans Lean Politically?’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nfl-fandome on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Data behind the story How Every NFL Team’s Fans Lean Politically.

    Google Trends Data

    Google Trends data was derived from comparing 5-year search traffic for the 7 sports leagues we analyzed:

    https://g.co/trends/5P8aa

    Results are listed by designated market area (DMA).

    The percentages are the approximate percentage of major-sports searches that were conducted for each league.

    Trump's percentage is his share of the vote within the DMA in the 2016 presidential election.

    SurveyMonkey Data

    SurveyMonkey data was derived from a poll of American adults ages 18 and older, conducted between Sept. 1-7, 2017.

    Listed numbers are the raw totals for respondents who ranked a given NFL team among their three favorites, and how many identified with a given party (further broken down by race). We also list the percentages of the entire sample that identified with each party, and were of each race.

    The data is available under the Creative Commons Attribution 4.0 International License and the code is available under the MIT License. If you do find it useful, please let us know.

    Source: https://github.com/fivethirtyeight/data

    This dataset was created by FiveThirtyEight and contains around 0 samples along with Unnamed: 10, Unnamed: 4, technical information and other features such as: - Unnamed: 3 - Unnamed: 1 - and more.

    How to use this dataset

    • Analyze Unnamed: 13 in relation to Unnamed: 21
    • Study the influence of Unnamed: 7 on Unnamed: 12
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit FiveThirtyEight

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  9. Instagram Dataset

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 26, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Instagram Dataset [Dataset]. https://brightdata.com/products/datasets/instagram
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 26, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Access detailed insights with our Instagram datasets, featuring follower counts, verified status, account types, and engagement scores. Explore post information including URLs, descriptions, hashtags, comments, likes, media, posting dates, locations, and reel URLs. Perfect for understanding user engagement and content trends to drive informed decisions and optimize your social media strategies. Over 750M records available Price starts at $250/100K records Data formats are available in JSON, NDJSON, CSV, XLSX and Parquet. 100% ethical and compliant data collection Included datapoints:

    Account Fbid Id Followers Posts Count Is Business Account Is Professional Account Is Verified Avg Engagement External Url Biography Business Category Name Category Name Post Hashtags Following Posts Profile Image Link Profile URL Profile Name Highlights Count Highlights Full Name Is Private Bio Hashtags URL Is Joined Recently And much more

  10. I

    Cline Center Coup d’État Project Dataset

    • databank.illinois.edu
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
    Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
    Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7

  11. Randomized Battery Usage 1: Random Walk

    • data.nasa.gov
    • catalog.data.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Randomized Battery Usage 1: Random Walk [Dataset]. https://data.nasa.gov/dataset/randomized-battery-usage-1-random-walk
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This dataset is part of a series of datasets, where batteries are continuously cycled with randomly generated current profiles. Reference charging and discharging cycles are also performed after a fixed interval of randomized usage to provide reference benchmarks for battery state of health. In this dataset, four 18650 Li-ion batteries (Identified as RW9, RW10, RW11 and RW12) were continuously operated using a sequence of charging and discharging currents between -4.5A and 4.5A. This type of charging and discharging operation is referred to here as random walk (RW) operation. Each of the loading periods lasted 5 minutes, and after 1500 periods (about 5 days) a series of reference charging and discharging cycles were performed in order to provide reference benchmarks for battery state health.

  12. d

    Low Intensity Land-use Overlap Colorado Plateau

    • catalog.data.gov
    • search.dataone.org
    • +3more
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Low Intensity Land-use Overlap Colorado Plateau [Dataset]. https://catalog.data.gov/dataset/low-intensity-land-use-overlap-colorado-plateau
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Colorado Plateau
    Description

    This dataset represents the spatial overlap in areas with present (cropland, grazing, and recreation), potential (petroleum resources), or future (population) low-intensity land-use. The final areas designated as low-intensity land-use represent low intensity relative to the range of values for that variable in this region, either present or future, and do not represent the rate of change, recent or future, for that particular variable. We defined low-intensity areas as ≤25% quantile for that variable over the entire Colorado Plateau. We combined grazing and cropland low-intensity area into one “low agriculture” variable. Each band represents a unique overlap between the land-use types.

  13. d

    Data from: Updated U.S. Low-Temperature Heating and Cooling Demand by County...

    • catalog.data.gov
    • gdr.openei.org
    • +2more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Renewable Energy Laboratory (2025). Updated U.S. Low-Temperature Heating and Cooling Demand by County and Sector [Dataset]. https://catalog.data.gov/dataset/updated-u-s-low-temperature-heating-and-cooling-demand-by-county-and-sector-5981c
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    National Renewable Energy Laboratory
    Area covered
    United States
    Description

    This dataset includes U.S. low-temperature heating and cooling demand at the county level in major end-use sectors: residential, commercial, manufacturing, agricultural, and data centers. Census division-level end-use energy consumption, expenditure, and commissioned power database were dis-aggregated to the county level. The county-level database was incorporated with climate zone, numbers of housing units and farms, farm size, and coefficient of performance (COP) for heating and cooling demand analysis. This dataset also includes a paper containing a full explanation of the methodologies used and maps. Residential data were updated from the latest Residential Energy Consumption Survey (RECS) dataset (2015) using 2020 census data. Commercial data were baselined off the latest Commercial Building Energy Consumption Survey (CBECS) dataset (2012). Manufacturing data were baselined off the latest Manufacturing Energy Consumption Survey (MECS) dataset (2021).

  14. Dataset relating a study on Geospatial Open Data usage and metadata quality

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jun 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino (2023). Dataset relating a study on Geospatial Open Data usage and metadata quality [Dataset]. http://doi.org/10.5281/zenodo.4280594
    Explore at:
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alfonso Quarati; Alfonso Quarati; Monica De Martino; Monica De Martino
    Description

    The Open Government Data portals (OGD) thanks to the presence of thousands of geo-referenced datasets, containing spatial information, are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. Besides, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

    The dataset consists of six zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 160,000 geospatial datasets belonging to the three national and three international portals considered in the study, i.e. US (catalog.data.gov), Colombia (datos.gov.co), Ireland (data.gov.ie), HDX (data.humdata.org), EUODP (data.europa.eu), and NASA (data.nasa.gov).

    Data collection occurred in the period: 2019-12-19 -- 2019-12-23.

    The header for each CSV file is:

    [ ,portalid,id,downloaddate,metadata,overallq,qvalues,assessdate,dviews,downloads,engine,admindomain]

    where for each row (a portal's dataset) the following fields are defined as follows:

    • portalid: portal identifier
    • id: dataset identifier
    • downloaddate: date of data collection
    • metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema
    • overallq: overall quality values computed by applying the methodology presented in [1]
    • qvalues: json object containing the quality values computed for the 17 metrics presented in [1]
    • assessdate: date of quality assessment
    • dviews: number of total views for the dataset
    • downloads: number of total downloads for the dataset (made available only by the Colombia, HDX, and NASA portals)
    • engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)
    • admindomain: 1 (national), 2 (international)

    [1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909

  15. Z

    Data from: A Large-scale Dataset of (Open Source) License Text Variants

    • data.niaid.nih.gov
    Updated Mar 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stefano Zacchiroli (2022). A Large-scale Dataset of (Open Source) License Text Variants [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6379163
    Explore at:
    Dataset updated
    Mar 31, 2022
    Dataset authored and provided by
    Stefano Zacchiroli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We introduce a large-scale dataset of the complete texts of free/open source software (FOSS) license variants. To assemble it we have collected from the Software Heritage archive—the largest publicly available archive of FOSS source code with accompanying development history—all versions of files whose names are commonly used to convey licensing terms to software users and developers. The dataset consists of 6.5 million unique license files that can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. Additional metadata about shipped license files are also provided, making the dataset ready to use in various contexts; they include: file length measures, detected MIME type, detected SPDX license (using ScanCode), example origin (e.g., GitHub repository), oldest public commit in which the license appeared. The dataset is released as open data as an archive file containing all deduplicated license blobs, plus several portable CSV files for metadata, referencing blobs via cryptographic checksums.

    For more details see the included README file and companion paper:

    Stefano Zacchiroli. A Large-scale Dataset of (Open Source) License Text Variants. In proceedings of the 2022 Mining Software Repositories Conference (MSR 2022). 23-24 May 2022 Pittsburgh, Pennsylvania, United States. ACM 2022.

    If you use this dataset for research purposes, please acknowledge its use by citing the above paper.

  16. C

    Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for...

    • data.4tu.nl
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Raw Data for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20017748.v2
    Explore at:
    Dataset updated
    Jun 7, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains raw data for cameras and wearables of the ConfLab dataset.


    ./cameras

    contains the overhead video recordings for 9 cameras (cam2-10) in MP4 files.

    These cameras cover the whole interaction floor, with camera 2 capturing the

    bottom of the scene layout, and camera 10 capturing top of the scene layout.

    Note that cam5 ran out of battery before the other cameras and thus the recordings

    are cut short. However, cam4 and 6 contain significant overlap with cam 5, to

    reconstruct any information needed.


    Note that the annotations are made and provided in 2 minute segments.

    The annotated portions of the video include the last 3min38sec of x2xxx.MP4

    video files, and the first 12 min of x3xxx.MP4 files for cameras (2,4,6,8,10),

    with "x" being the placeholder character in the mp4 file names. If one wishes

    to separate the video into 2 min segments as we did, the "video-splitting.sh"

    script is provided.


    ./camera-calibration contains the camera instrinsic files obtained from

    https://github.com/idiap/multicamera-calibration. Camera extrinsic parameters can

    be calculated using the existing intrinsic parameters and the instructions in the

    multicamera-calibration repo. The coordinates in the image are provided by the

    crosses marked on the floor, which are visible in the video recordings.

    The crosses are 1m apart (=100cm).


    ./wearables

    subdirectory includes the IMU, proximity and audio data from each

    participant at the Conflab event (48 in total). In the directory numbered

    by participant ID, the following data are included:

    1. raw audio file

    2. proximity (bluetooth) pings (RSSI) file (raw and csv) and a visualization

    3. Tri-axial accelerometer data (raw and csv) and a visualization

    4. Tri-axial gyroscope data (raw and csv) and a visualization

    5. Tri-axial magnetometer data (raw and csv) and a visualization

    6. Game rotation vector (raw and csv), recorded in quaternions.


    All files are timestamped.

    The sampling frequencies are:

    - audio: 1250 Hz

    - rest: around 50Hz. However, the sample rate is not fixed

    and instead the timestamps should be used.


    For rotation, the game rotation vector's output frequency is limited by the

    actual sampling frequency of the magnetometer. For more information, please refer to

    https://invensense.tdk.com/wp-content/uploads/2016/06/DS-000189-ICM-20948-v1.3.pdf


    Audio files in this folder are in raw binary form. The following can be used to convert

    them to WAV files (1250Hz):


    ffmpeg -f s16le -ar 1250 -ac 1 -i /path/to/audio/file


    Synchronization of cameras and werables data

    Raw videos contain timecode information which matches the timestamps of the data in

    the "wearables" folder. The starting timecode of a video can be read as:

    ffprobe -hide_banner -show_streams -i /path/to/video


    ./audio

    ./sync: contains wav files per each subject

    ./sync_files: auxiliary csv files used to sync the audio. Can be used to improve the synchronization.

    The code used for syncing the audio can be found here:

    https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/audio

  17. Data from: USDA-ARS Colorado Maize Water Productivity Dataset 2008-2011

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Jun 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). USDA-ARS Colorado Maize Water Productivity Dataset 2008-2011 [Dataset]. https://catalog.data.gov/dataset/usda-ars-colorado-maize-water-productivity-dataset-2008-2011-5460b
    Explore at:
    Dataset updated
    Jun 5, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA-Agricultural Research Service carried out a water productivity field trial for irrigated maize (Zea mays L.) at the Limited Irrigation Research Farm (LIRF) facility in northeastern Colorado in 2008 through 2011. The dataset includes daily measurements of irrigation, precipitation, soil water storage, and plant growth; daily estimates of crop evapotranspiration; and seasonal measurement of crop water use and crop yield. Soil parameters and hourly and daily weather data are also provided. The dataset can be useful to validate and refine maize crop models. The data are presented in spreadsheet format. The primary data files are the four annual LIRF Maize 20xx.xlsx files that include the daily water balance and phenology, final yield and biomass data, and crop management logs. Annual LIRF Weather 20xx.xlsx files provide hourly and daily weather parameters including reference evapotranspiration. The LIRF Soils.xlsx file gives soil parameters. Each spreadsheet contains a Data Descriptions worksheet that provides worksheet or column specific information. Comments are embedded in cells with specific information. A LIRF photos.pdf file provides images of the experimental area, measurement processes and crop conditions. Photo credit Peggy Greb, ARS; copyright-free, public domain copyright policy. Resources in this dataset:Resource Title: LIRF Weather 2008. File Name: LIRF Weather 2008.xlsxResource Description: LIRF hourly and daily weather data for 2008Resource Title: LIRF Weather 2009. File Name: LIRF Weather 2009.xlsxResource Description: LIRF hourly and daily weather data for 2009Resource Title: LIRF Weather 2010. File Name: LIRF Weather 2010.xlsxResource Description: LIRF hourly and daily weather data for 2010Resource Title: LIRF Weather 2011. File Name: LIRF Weather 2011.xlsxResource Description: LIRF hourly and daily weather data for 2011Resource Title: LIRF Soils. File Name: LIRF Soils.xlsxResource Description: LIRF soil maps, soil texture, moisture retention, and chemical constituentsResource Title: LIRF Photo Log. File Name: LIRF Photo Log.pdfResource Description: Photos of the LIRF Water Productivity field trials and instrumentation.Resource Title: Data Dictionaries. File Name: DataDictionary r1.xlsxResource Description: Data descriptions of all the data resources (also included in their respective data files).Resource Title: LIRF Methodology. File Name: LIRF Methodology r1.pdfResource Description: Description of data files, data, and data collection methodology for the LIRF 2008-2011 Water Productivity field trials.Resource Title: LIRF Maize 2008. File Name: LIRF Maize 2008 r1.xlsxResource Description: Water balance and yield data for 2008 LIRF field trialResource Title: LIRF Maize 2009. File Name: LIRF Maize 2009 r1.xlsxResource Description: Water balance and yield data for 2009 LIRF field trialResource Title: LIRF Maize 2010. File Name: LIRF Maize 2010 r1.xlsxResource Description: Water balance and yield data for 2010 LIRF field trialResource Title: LIRF Maize 2011. File Name: LIRF Maize 2011 r1.xlsxResource Description: Water balance and yield data for 2011 LIRF field trial

  18. CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine...

    • data.csiro.au
    • researchdata.edu.au
    Updated Dec 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li (2022). CSIRO Sentinel-1 SAR image dataset of oil- and non-oil features for machine learning ( Deep Learning ) [Dataset]. http://doi.org/10.25919/4v55-dn16
    Explore at:
    Dataset updated
    Dec 15, 2022
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    David Blondeau-Patissier; Thomas Schroeder; Foivos Diakogiannis; Zhibin Li
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    May 1, 2015 - Aug 31, 2022
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    ESA
    Description

    What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.

    This binary dataset contains chips labelled as: - "0" for chips not containing any oil features (look-alikes or clean seas)
    - "1" for those containing oil features.

    This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.

    Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.

    Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905

    Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)

  19. student-performance-data

    • kaggle.com
    Updated Jun 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Azam (2025). student-performance-data [Dataset]. http://doi.org/10.34740/kaggle/dsv/12160820
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muhammad Azam
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Student Performance Data

    This dataset provides insights into various factors influencing the academic performance of students. It is curated for use in educational research, data analytics projects, and predictive modeling. The data reflects a combination of personal, familial, and academic-related variables gathered through observation or survey.

    The dataset includes a diverse range of students and captures key characteristics such as study habits, family background, school attendance, and overall performance. It is well-suited for exploring correlations, visualizing trends, and training machine learning models related to academic outcomes.

    Highlights:

    Clean, structured format suitable for immediate use Designed for beginner to intermediate-level data analysis Valuable for classification, regression, and data storytelling projects

    File Format:

    Type: CSV (Comma-Separated Values) Encoding: UTF-8 Structure: Each row represents a student record

    Applications

    Student performance prediction Educational policy planning Identification of performance gaps and influencing factors Exploratory data analysis and visualization

  20. Soil Data Grevena

    • kaggle.com
    • data.mendeley.com
    Updated Sep 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jocelyn Dumlao (2023). Soil Data Grevena [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/soil-data-grevena
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 4, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jocelyn Dumlao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Grevena
    Description

    Description

    In this dataset, there are soil data analyses with properties such as pH, organic matter (OM), salinity (EC), etc., major elements (N, P, K, Mg) as well as some microelements (Fe, Zn, Mn, Cu, B) with significant impact on plant nutrition.

    Categories

    Agricultural Soil

    Acknowledgements & Source

    Panagiotis Tziachris

    Data Source

    View Details

    Image Source

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ptdevsecops (2024). Intelligent Monitor [Dataset]. http://doi.org/10.34740/kaggle/ds/4383210
Organization logo

Intelligent Monitor

Empowering DevOps Environments With Advanced Monitoring and Observability

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 12, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ptdevsecops
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

IntelligentMonitor: Empowering DevOps Environments With Advanced Monitoring and Observability aims to improve monitoring and observability in complex, distributed DevOps environments by leveraging machine learning and data analytics. This repository contains a sample implementation of the IntelligentMonitor system proposed in the research paper, presented and published as part of the 11th International Conference on Information Technology (ICIT 2023).

If you use this dataset and code or any herein modified part of it in any publication, please cite these papers:

P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

For any questions and research queries - please reach out via Email.

Abstract - In the dynamic field of software development, DevOps has become a critical tool for enhancing collaboration, streamlining processes, and accelerating delivery. However, monitoring and observability within DevOps environments pose significant challenges, often leading to delayed issue detection, inefficient troubleshooting, and compromised service quality. These issues stem from DevOps environments' complex and ever-changing nature, where traditional monitoring tools often fall short, creating blind spots that can conceal performance issues or system failures. This research addresses these challenges by proposing an innovative approach to improve monitoring and observability in DevOps environments. Our solution, Intelligent-Monitor, leverages realtime data collection, intelligent analytics, and automated anomaly detection powered by advanced technologies such as machine learning and artificial intelligence. The experimental results demonstrate that IntelligentMonitor effectively manages data overload, reduces alert fatigue, and improves system visibility, thereby enhancing performance and reliability. For instance, the average CPU usage across all components showed a decrease of 9.10%, indicating improved CPU efficiency. Similarly, memory utilization and network traffic showed an average increase of 7.33% and 0.49%, respectively, suggesting more efficient use of resources. By providing deep insights into system performance and facilitating rapid issue resolution, this research contributes to the DevOps community by offering a comprehensive solution to one of its most pressing challenges. This fosters more efficient, reliable, and resilient software development and delivery processes.

Components The key components that would need to be implemented are:

  • Data Collection - Collect performance metrics and log data from the distributed system components. Could use technology like Kafka or telemetry libraries.
  • Data Processing - Preprocess and aggregate the collected data into an analyzable format. Could use Spark for distributed data processing.
  • Anomaly Detection - Apply machine learning algorithms to detect anomalies in the performance metrics. Could use isolation forest or LSTM models.
  • Alerting - Generate alerts when anomalies are detected. It could integrate with tools like PagerDuty.
  • Visualization - Create dashboards to visualize system health and key metrics. Could use Grafana or Kibana.
  • Data Storage - Store the collected metrics and log data. Could use Elasticsearch or InfluxDB.

Implementation Details The core of the implementation would involve the following: - Setting up the data collection pipelines. - Building and training anomaly detection ML models on historical data. - Developing a real-time data processing pipeline. - Creating an alerting framework that ties into the ML models. - Building visualizations and dashboards.

The code would need to handle scaled-out, distributed execution for production environments.

Proper code documentation, logging, and testing would be added throughout the implementation.

Usage Examples Usage examples could include:

  • Running the data collection agents on each system component.
  • Visualizing system metrics through Grafana dashboards.
  • Investigating anomalies detected by the ML models.
  • Tuning the alerting rules to minimize false positives.
  • Correlating metrics with log data to troubleshoot issues.

References The implementation would follow the details provided in the original research paper: P. Thantharate, "IntelligentMonitor: Empowering DevOps Environments with Advanced Monitoring and Observability," 2023 International Conference on Information Technology (ICIT), Amman, Jordan, 2023, pp. 800-805, doi: 10.1109/ICIT58056.2023.10226123.

Any additional external libraries or sources used would be properly cited.

Tags - DevOps, Software Development, Collaboration, Streamlini...

Search
Clear search
Close search
Google apps
Main menu