57 datasets found
  1. Data from: An Experimental Investigation on the Innate Relationship between...

    • figshare.com
    zip
    Updated May 10, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriele Bavota; Andrea De Lucia; Massimiliano Di Penta; Rocco Oliveto; Fabio Palomba (2016). An Experimental Investigation on the Innate Relationship between Quality and Refactoring [Dataset]. http://doi.org/10.6084/m9.figshare.1207916.v5
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 10, 2016
    Dataset provided by
    figshare
    Authors
    Gabriele Bavota; Andrea De Lucia; Massimiliano Di Penta; Rocco Oliveto; Fabio Palomba
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells. Nevertheless, the existing literature lacks of observations about the relations between metrics/code smells and refactoring operations performed by de- velopers. In other words, the characteristics of code components pushing developers to refactor them are still unknown. This paper aims at bridging this gap by analyzing which code characteristics trigger the developers refactoring attentions. Specifically, we mined the evolution history of three Java open source projects to investigate whether developers refactoring activities occur on code components for which cer- tain indicators—such as quality metrics or the presence of smells as detected by tools—suggest there might be need for refactoring operations. Results indicate that, more often than not, quality metrics do not show a clear relationship with refactoring. In other words, refactoring operations performed by developers are generally focused on code components for which quality metrics do not suggest there might be need for refactoring operations. Finally, 42% of refactoring operations are performed on code entities affected by code smells. However, only 7% of the performed operations actually remove the code smells from the affected class.

  2. r

    Data from: Where do engineering students really get their information? :...

    • researchdata.edu.au
    • opal.latrobe.edu.au
    Updated Aug 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Clayton Bolitho (2020). Where do engineering students really get their information? : using reference list analysis to improve information literacy programs [Dataset]. http://doi.org/10.4225/22/59D45F4B696E4
    Explore at:
    Dataset updated
    Aug 10, 2020
    Dataset provided by
    La Trobe University
    Authors
    Clayton Bolitho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background
    An understanding of the resources which engineering students use to write their academic papers provides information about student behaviour as well as the effectiveness of information literacy programs designed for engineering students. One of the most informative sources of information which can be used to determine the nature of the material that students use is the bibliography at the end of the students’ papers. While reference list analysis has been utilised in other disciplines, few studies have focussed on engineering students or used the results to improve the effectiveness of information literacy programs. Gadd, Baldwin and Norris (2010) found that civil engineering students undertaking a finalyear research project cited journal articles more than other types of material, followed by books and reports, with web sites ranked fourth. Several studies, however, have shown that in their first year at least, most students prefer to use Internet search engines (Ellis & Salisbury, 2004; Wilkes & Gurney, 2009).

    PURPOSE
    The aim of this study was to find out exactly what resources undergraduate students studying civil engineering at La Trobe University were using, and in particular, the extent to which students were utilising the scholarly resources paid for by the library. A secondary purpose of the research was to ascertain whether information literacy sessions delivered to those students had any influence on the resources used, and to investigate ways in which the information literacy component of the unit can be improved to encourage students to make better use of the resources purchased by the Library to support their research.

    DESIGN/METHOD
    The study examined student bibliographies for three civil engineering group projects at the Bendigo Campus of La Trobe University over a two-year period, including two first-year units (CIV1EP – Engineering Practice) and one-second year unit (CIV2GR – Engineering Group Research). All units included a mandatory library session at the start of the project where student groups were required to meet with the relevant faculty librarian for guidance. In each case, the Faculty Librarian highlighted specific resources relevant to the topic, including books, e-books, video recordings, websites and internet documents. The students were also shown tips for searching the Library catalogue, Google Scholar, LibSearch (the LTU Library’s research and discovery tool) and ProQuest Central. Subject-specific databases for civil engineering and science were also referred to. After the final reports for each project had been submitted and assessed, the Faculty Librarian contacted the lecturer responsible for the unit, requesting copies of the student bibliographies for each group. References for each bibliography were then entered into EndNote. The Faculty Librarian grouped them according to various facets, including the name of the unit and the group within the unit; the material type of the item being referenced; and whether the item required a Library subscription to access it. A total of 58 references were collated for the 2010 CIV1EP unit; 237 references for the 2010 CIV2GR unit; and 225 references for the 2011 CIV1EP unit.

    INTERIM FINDINGS
    The initial findings showed that student bibliographies for the three group projects were primarily made up of freely available internet resources which required no library subscription. For the 2010 CIV1EP unit, all 58 resources used were freely available on the Internet. For the 2011 CIV1EP unit, 28 of the 225 resources used (12.44%) required a Library subscription or purchase for access, while the second-year students (CIV2GR) used a greater variety of resources, with 71 of the 237 resources used (29.96%) requiring a Library subscription or purchase for access. The results suggest that the library sessions had little or no influence on the 2010 CIV1EP group, but the sessions may have assisted students in the 2011 CIV1EP and 2010 CIV2GR groups to find books, journal articles and conference papers, which were all represented in their bibliographies

    FURTHER RESEARCH
    The next step in the research is to investigate ways to increase the representation of scholarly references (found by resources other than Google) in student bibliographies. It is anticipated that such a change would lead to an overall improvement in the quality of the student papers. One way of achieving this would be to make it mandatory for students to include a specified number of journal articles, conference papers, or scholarly books in their bibliographies. It is also anticipated that embedding La Trobe University’s Inquiry/Research Quiz (IRQ) using a constructively aligned approach will further enhance the students’ research skills and increase their ability to find suitable scholarly material which relates to their topic. This has already been done successfully (Salisbury, Yager, & Kirkman, 2012)

    CONCLUSIONS & CHALLENGES
    The study shows that most students rely heavily on the free Internet for information. Students don’t naturally use Library databases or scholarly resources such as Google Scholar to find information, without encouragement from their teachers, tutors and/or librarians. It is acknowledged that the use of scholarly resources doesn’t automatically lead to a high quality paper. Resources must be used appropriately and students also need to have the skills to identify and synthesise key findings in the existing literature and relate these to their own paper. Ideally, students should be able to see the benefit of using scholarly resources in their papers, and continue to seek these out even when it’s not a specific assessment requirement, though it can’t be assumed that this will be the outcome.

    REFERENCES

    Ellis, J., & Salisbury, F. (2004). Information literacy milestones: building upon the prior knowledge of first-year students. Australian Library Journal, 53(4), 383-396.

    Gadd, E., Baldwin, A., & Norris, M. (2010). The citation behaviour of civil engineering students. Journal of Information Literacy, 4(2), 37-49.

    Salisbury, F., Yager, Z., & Kirkman, L. (2012). Embedding Inquiry/Research: Moving from a minimalist model to constructive alignment. Paper presented at the 15th International First Year in Higher Education Conference, Brisbane. Retrieved from http://www.fyhe.com.au/past_papers/papers12/Papers/11A.pdf

    Wilkes, J., & Gurney, L. J. (2009). Perceptions and applications of information literacy by first year applied science students. Australian Academic & Research Libraries, 40(3), 159-171.

  3. Big Data Engineering Services Market - Size, Share & Companies

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence, Big Data Engineering Services Market - Size, Share & Companies [Dataset]. https://www.mordorintelligence.com/industry-reports/big-data-engineering-services-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    The Big Data Engineering Services Market Report is Segmented by Type (Data Modelling, Data Quality, and Analytics), Business Function (Marketing and Sales, Finance, and HR), Organization Size (Small and Medium Enterprises and Large Enterprises), End-User Industry (BFSI, Manufacturing, and Government), and Geography (North America, Europe, Asia-Pacific, Latin America, and Middle East & Africa). The Market Sizes and Forecasts are Provided in Terms of Value (USD) for all the Above Segments.

  4. d

    Quality assurance data to evaluate the vertical accuracy of the bathymetric...

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Quality assurance data to evaluate the vertical accuracy of the bathymetric data for Beaver Lake near Rogers, Arkansas, 2018 [Dataset]. https://catalog.data.gov/dataset/quality-assurance-data-to-evaluate-the-vertical-accuracy-of-the-bathymetric-data-for-beave
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Arkansas, Rogers, Beaver Lake
    Description

    Beaver Lake was constructed in 1966 on the White River in the northwest corner of Arkansas for flood control, hydroelectric power, public water supply, and recreation. The surface area of Beaver Lake is about 27,900 acres and approximately 449 miles of shoreline are at the conservation pool level (1,120 feet above the North American Vertical Datum of 1988). Sedimentation in reservoirs can result in reduced water storage capacity and a reduction in usable aquatic habitat. Therefore, accurate and up-to-date estimates of reservoir water capacity are important for managing pool levels, power generation, water supply, recreation, and downstream aquatic habitat. Many of the lakes operated by the U.S. Army Corps of Engineers are periodically surveyed to monitor bathymetric changes that affect water capacity. In October 2018, the U.S. Geological Survey, in cooperation with the U.S. Army Corps of Engineers, completed one such survey of Beaver Lake using a multibeam echosounder. The echosounder data was combined with light detection and ranging (lidar) data to prepare a bathymetric map and a surface area and capacity table. Bathymetric quality-assurance data contained in this dataset (BeaverLake2018_QA.zip) were collected to evaluate the vertical accuracy of the gridded bathymetric point data (BeaverLake2018_bathy.zip) used for creation of mapping contours and the area-capacity table.

  5. o

    Experiment Data - 952 Assessments of 8 Vision Videos Regarding Overall Video...

    • explore.openaire.eu
    • zenodo.org
    Updated Nov 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oliver Karras; Kurt Schneider; Samuel A. Fricker (2019). Experiment Data - 952 Assessments of 8 Vision Videos Regarding Overall Video Quality and 15 Individual Quality Characteristics [Dataset]. http://doi.org/10.5281/zenodo.3549435
    Explore at:
    Dataset updated
    Nov 21, 2019
    Authors
    Oliver Karras; Kurt Schneider; Samuel A. Fricker
    Description

    In 2018, we conducted a within-subjects experiment to investigate how individual quality characteristics of vision videos relate to the overall quality of vision videos from a developer's point of view. 139 undergraduate students who had the role of a developer and actively developed software in projects with real customers at the time of the experiment participated in the experiment. The subjects can be considered as developers due to their experience at the time of the experiment. The subjects were put in the situation that they join an ongoing project in their familiar role as a developer. In this context, we showed the 8 vision videos (one after the other) always with the intent to share the vision of the particular project with the subjects. The undergraduate students subjectively assessed the overall quality and 15 individual quality characteristics of the 8 vision videos by completing an assessment form for each video. After data cleaning, the final data set contains 952 complete assessments of 119 subjects for the 8 vision videos. Each entry of the data set consists of: Entry ID: The ID of the entry in the dataset. Subject ID: The ID of the subject. Video ID: The ID of the vision video assessed. Overall quality: The subject's assessment of the overall quality of the vision video. Image quality: The subject's assessment of the visual quality of the image of the vision video. Sound quality: The subject's assessment of the auditory quality of the sound of the vision video. Video length [s]: The duration of the vision video in seconds. Focus: The subject's assessment of the compact representation of the vision which is presented in the vision video. Plot: The subject's assessment of the structured presentation of the content of the vision video. Prior knowledge: The subject's assessment of the presupposed prior knowledge to understand the content of the vision video. Clarity: The subject's assessment of the intelligibility of the aspired goals of the vision which is presented in the vision video. Essence: The subject's assessment of the amount of important core elements, e.g., persons, locations, and entities, which are to be presented in the vision video. Clutter: The subject's assessment of the amount of disrupting and distracting elements, e.g., background actions or noises, that can be inadvertently recorded in the vision video. Completeness: The subject's assessment of the coverage of the three contents of a vision which is presented in the vision video, i.e., the considered problem, the proposed solution, and the improvement of the problem due to the solution. Pleasure: The subject's assessment of the enjoyment of watching the vision video. Intention: The subject's assessment of how well the vision video is suitable for the intended purpose of the given scenario. Sense of responsibility: The subject's assessment of the compliance of the vision video with legal regulations. Support: The subject's assessment of his or her level of acceptance of the vision which is presented in the vision video. Stability: The subject's assessment of the consistency of the vision which is presented in the vision video. This dataset includes the following files: "Dataset_Assessments.xlsx" contains the anonymized 952 assessments of the 119 subjects for the 8 vision videos "Assessment_form.docx" contains the assessment form which was used to assess each of the 8 vision videos "Assessment_form.pdf" contains the assessment form which was used to assess each of the 8 vision videos The 8 vision videos are not included in this dataset since we do not have the explicit consent of the actors to distribute the vision videos. This experiment was designed, conducted, and analyzed by Oliver Karras (@KarrasOliver), Kurt Schneider, and Samuel A. Fricker (@samuelfricker).

  6. Data from: Quality Assurance of a German COVID-19 Question Answering Systems...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Both; Paul Heinze; Aleksandr Perevalov; Johannes Richard Bartsch; Rostislav Iudin; Johannes Rudolf Herkner; Tim Schrader; Jonas Wunsch; René Gürth; Ann Kristin Falkenhain (2023). Quality Assurance of a German COVID-19 Question Answering Systems using Component-based Microbenchmarking [Dataset]. http://doi.org/10.6084/m9.figshare.17833028.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Andreas Both; Paul Heinze; Aleksandr Perevalov; Johannes Richard Bartsch; Rostislav Iudin; Johannes Rudolf Herkner; Tim Schrader; Jonas Wunsch; René Gürth; Ann Kristin Falkenhain
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Supplementary data for the paper "Quality Assurance of a German COVID-19 Question Answering Systems using Component-based Microbenchmarking" at the 15th ACM International WSDM Conference (WSDM 2022).Abstract: Question Answering (QA) has become an often used method to retrieve data as part of chatbots and other natural-language user interfaces. In particular, QA systems of official institutions have high expectations regarding the answers computed by the system, as the provided information might be critical. In this demonstration, we use the official COVID-19 QA system that was developed together with the German Federal government to provide German citizens access to data regarding incident values, number of deaths, etc. To ensure high quality, a component-based approach was used that enables exchanging data between QA components using RDF and validating the functionality of the QA system using SPARQL. Here, we will demonstrate how our solution enables developers of QA systems to use a descriptive approach to validate the quality of their implementation before the system's deployment and also within a live environment.

  7. AIRS/Aqua L1B Near Real Time (NRT) Infrared (IR) quality assurance subset...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • +3more
    application/rdfxml +5
    Updated Sep 20, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). AIRS/Aqua L1B Near Real Time (NRT) Infrared (IR) quality assurance subset V005 (AIRIBQAP_NRT) at GES DISC [Dataset]. https://data.nasa.gov/dataset/AIRS-Aqua-L1B-Near-Real-Time-NRT-Infrared-IR-quali/ca82-gt9p
    Explore at:
    application/rssxml, csv, tsv, application/rdfxml, xml, jsonAvailable download formats
    Dataset updated
    Sep 20, 2019
    Description

    The AIRS Level 1B Near Real Time (NRT) product (AIRIBQAP_NRT_005) differs from the routine product (AIRIBQAP_005) in 2 ways to meet the three hour latency requirements of the Land Atmosphere NRT Capability Earth Observing System (LANCE): (1) The NRT granules are produced without previous or subsequent granules if those granules are not available within 5 minutes, (2) the predictive ephemeris/attitude data are used rather than the definitive ephemeris/attitude. The consequences of these differences are described in the AIRS Near Real Time (NRT) data products document. The Atmospheric Infrared Sounder (AIRS) is a facility instrument aboard the second Earth Observing System (EOS) polar-orbiting platform, EOS Aqua. In combination with the Advanced Microwave Sounding Unit (AMSU) and the Humidity Sounder for Brazil Humidity Sounder for Brazil (HSB), AIRS constitutes an innovative atmospheric sounding group of visible, infrared, and microwave sensors. AIRS data will be generated continuously. Global coverage will be obtained twice daily (day and night) on a 1:30pm sun synchronous orbit from a 705-km altitude. The AIRS IR Level 1B QA Subset contains Quality Assurance (QA) parameters that a user of may use to filter AIRS IR Level 1B radiance data to create a subset of analysis. QA parameters indicate quality of granule-per-channel, scan-per-channel, field of view, and channel and should be accessed before any data of analysis. It also contains "glintlat", "glintlon", and "sun_glint_distant" that users can use to check for possibility of solar glint contamination.

  8. o

    33kV Circuit Operational Data Half Hourly - Eastern Power Networks (EPN)

    • ukpowernetworks.opendatasoft.com
    Updated Mar 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). 33kV Circuit Operational Data Half Hourly - Eastern Power Networks (EPN) [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-33kv-circuit-operational-data-half-hourly-epn/
    Explore at:
    Dataset updated
    Mar 20, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction UK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is distributing this electricity across our regions through circuits. Electricity enters our network through Super Grid Transformers at substations shared with National Grid we call Grid Supply Points. It is then sent at across our 132 kV Circuits towards our grid substations and primary substations. From there, electricity is distributed along the 33 kV circuits to bring it closer to the home. These circuits can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables.

    This dataset provides half-hourly current and power flow data across these named circuits from 2021 through to the previous month across our Eastern Power Networks (EPN) license area. The data is aligned with the same naming convention as the LTDS for improved interoperability.

    Care is taken to protect the private affairs of companies connected to the 33 kV network, resulting in the redaction of certain circuits. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted.

    To find which circuit you are looking for, use the ‘ltds_line_name’ that can be cross referenced in the 33kV Circuits Monthly Data, which describes by month what circuits were triaged, if they could be made public, and what the monthly statistics are of that site.

    If you want to download all this data, it is perhaps more convenient from our public sharepoint: Sharepoint This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets. Methodological Approach The dataset is not derived, it is the measurements from our network stored in our historian. The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps. We developed a data redactions process to protect the privacy or companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. This redaction process considers the correlation of all the data, of only corresponding periods where the customer is active, the first order difference of all the data, and the first order difference of only corresponding periods where the customer is active. Should any of these four tests have a high linear correlation, the data is deemed redacted. This process is not simply applied to only the circuit of the customer, but of the surrounding circuits that would also reveal the signal of that customer. The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation. Quality Control Statement The data is provided "as is".
    In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these measurements are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that. Assurance Statement Creating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS circuit from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same circuit in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets. There is potential for human error during the manual data processing. These issues can include missing circuits, incorrectly labelled circuits, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.

    Additional InformationDefinitions of key terms related to this dataset can be found in the Open Data Portal Glossary. Download dataset information: Metadata (JSON) We would be grateful if you find this dataset useful to submit a reuse case study to tell us what you did and how you used it. This enables us to drive our direction and gain better understanding for how we improve our data offering in the future. Click here for more information: Open Data Portal Reuses — UK Power Networks

  9. d

    The signal quality of grades across academic fields (replication data) -...

    • b2find.dkrz.de
    Updated Oct 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). The signal quality of grades across academic fields (replication data) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/71511bfa-daf0-5387-aa9d-9e87ba2e33f7
    Explore at:
    Dataset updated
    Oct 24, 2023
    Description

    I use transcript data from Duke University and a correlated learning model to measure the signal quality of grades across academic fields. I find science, engineering, and economics grades are significantly more informative than humanities and social science grades. The correlated learning structure allows grades in one field to signal abilities in all fields. This sometimes generates information spillovers so powerful that science, engineering, and economics grades inform humanities and social science beliefs more than humanities and social science grades. I show that grade compression reduces signal quality but cannot explain the differences in signal quality across academic fields.

  10. o

    132kV Circuit Operational Data Half Hourly

    • ukpowernetworks.opendatasoft.com
    Updated Mar 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). 132kV Circuit Operational Data Half Hourly [Dataset]. https://ukpowernetworks.opendatasoft.com/explore/dataset/ukpn-132kv-circuit-operational-data-half-hourly/
    Explore at:
    Dataset updated
    Mar 21, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionUK Power Network maintains the 132kV voltage level network and below. An important part of the distribution network is distributing this electricity across our regions through circuits. Electricity enters our network through Super Grid Transformers at substations shared with National Grid we call Grid Supply Points. It is then sent at across our 132 kV Circuits towards our grid substations and primary substations. These circuits can be viewed on the single line diagrams in our Long-Term Development Statements (LTDS) and the underlying data is then found in the LTDS tables.

    This dataset provides half-hourly current and power flow data across these named circuits from 2021 through to the previous month across our license areas. The data are aligned with the same naming convention as the LTDS for improved interoperability.

    Care is taken to protect the private affairs of companies connected to the 132 kV network, resulting in the redaction of certain circuits. Where redacted, we provide monthly statistics to continue to add value where possible. Where monthly statistics exist but half-hourly is absent, this data has been redacted.

    To find which circuit you are looking for, use the ‘ltds_line_name’ that can be cross-referenced in the 132kV Circuits Monthly Data, which describes by month what circuits were triaged, if they could be made public, and what the monthly statistics are of that site.

    If you want to download all this data, it is perhaps more convenient from our public sharepoint: Sharepoint

    This dataset is part of a larger endeavour to share more operational data on UK Power Networks assets. Please visit our Network Operational Data Dashboard for more operational datasets.

    Methodological Approach

    The dataset is not derived, it is the measurements from our network stored in our historian.

    The measurement devices are taken from current transformers attached to the cable at the circuit breaker, and power is derived combining this with the data from voltage transformers physically attached to the busbar. The historian stores datasets based on a report-by-exception process, such that a certain deviation from the present value must be reached before logging a point measurement to the historian. We extract the data following a 30-min time weighted averaging method to get half-hourly values. Where there are no measurements logged in the period, the data provided is blank; due to the report-by-exception process, it may be appropriate to forward fill this data for shorter gaps.

    We developed a data redactions process to protect the privacy of companies according to the Utilities Act 2000 section 105.1.b, which requires UK Power Networks to not disclose information relating to the affairs of a business. For this reason, where the demand of a private customer is derivable from our data and that data is not already public information (e.g., data provided via Elexon on the Balancing Mechanism), we redact the half-hourly time series, and provide only the monthly averages. This redaction process considers the correlation of all the data, of only corresponding periods where the customer is active, the first order difference of all the data, and the first order difference of only corresponding periods where the customer is active. Should any of these four tests have a high linear correlation, the data is deemed redacted. This process is not simply applied to only the circuit of the customer, but of the surrounding circuits that would also reveal the signal of that customer.

    The directionality of the data is not consistent within this dataset. Where directionality was ascertainable, we arrange the power data in the direction of the LTDS "from node" to the LTDS "to node". Measurements of current do not indicate directionality and are instead positive regardless of direction. In some circumstances, the polarity can be negative, and depends on the data commissioner's decision on what the operators in the control room might find most helpful in ensuring reliable and secure network operation.

    Quality Control Statement

    The data is provided "as is".

    In the design and delivery process adopted by the DSO, customer feedback and guidance is considered at each phase of the project. One of the earliest steers was that raw data was preferable. This means that we do not perform prior quality control screening to our raw network data. The result of this decision is that network rearrangements and other periods of non-intact running of the network are present throughout the dataset, which has the potential to misconstrue the true utilisation of the network, which is determined regulatorily by considering only by in-tact running arrangements. Therefore, taking the maximum or minimum of these measurements are not a reliable method of correctly ascertaining the true utilisation. This does have the intended added benefit of giving a realistic view of how the network was operated. The critical feedback was that our customers have a desire to understand what would have been the impact to them under real operational conditions. As such, this dataset offers unique insight into that.

    Assurance Statement

    Creating this dataset involved a lot of human data imputation. At UK Power Networks, we have differing software to run the network operationally (ADMS) and to plan and study the network (PowerFactory). The measurement devices are intended to primarily inform the network operators of the real time condition of the network, and importantly, the network drawings visible in the LTDS are a planning approach, which differs to the operational. To compile this dataset, we made the union between the two modes of operating manually. A team of data scientists, data engineers, and power system engineers manually identified the LTDS circuit from the single line diagram, identified the line name from LTDS Table 2a/b, then identified the same circuit in ADMS to identify the measurement data tags. This was then manually inputted to a spreadsheet. Any influential customers to that circuit were noted using ADMS and the single line diagrams. From there, a python code is used to perform the triage and compilation of the datasets.

    There is potential for human error during the manual data processing. These issues can include missing circuits, incorrectly labelled circuits, incorrectly identified measurement data tags, incorrectly interpreted directionality. Whilst care has been taken to minimise the risk of these issues, they may persist in the provided dataset. Any uncertain behaviour observed by using this data should be reported to allow us to correct as fast as possible.

    Additional Information

    Definitions of key terms related to this dataset can be found in the Open Data Portal Glossary.

    Download dataset information: Metadata (JSON)

  11. nps tracts

    • hub.arcgis.com
    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    • +7more
    Updated Sep 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri U.S. Federal Datasets (2023). nps tracts [Dataset]. https://hub.arcgis.com/maps/fedmaps::nps-tracts
    Explore at:
    Dataset updated
    Sep 26, 2023
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    Esri U.S. Federal Datasets
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This service depicts National Park Service tract and boundary data that was created by the Land Resources Division. NPS Director's Order #25 states: "Land status maps will be prepared to identify the ownership of the lands within the authorized boundaries of the park unit. These maps, showing ownership and acreage, are the 'official record' of the acreage of Federal and non-federal lands within the park boundaries. While these maps are the official record of the lands and acreage within the unit's authorized boundaries, they are not of survey quality and not intended to be used for survey purposes." As such this data is intended for use as a tool for GIS analysis. It is in no way intended for engineering or legal purposes. The data accuracy is checked against best available sources which may be dated and vary by location. NPS assumes no liability for use of this data. The boundary polygons represent the current legislated boundary of a given NPS unit. NPS does not necessarily have full fee ownership or hold another interest (easement, right of way, etc...) in all parcels contained within this boundary. Equivalently NPS may own or have an interest in parcels outside the legislated boundary of a given unit. In order to obtain complete information about current NPS interests both inside and outside a unit’s legislated boundary tract level polygons are also created by NPS Land Resources Division and should be used in conjunction with this boundary data. To download this data directly from the NPS go to https://irma.nps.gov Property ownership data is compiled from deeds, plats, surveys, and other source data. These are not engineering quality drawings and should be used for administrative purposes only. The National Park Service (NPS) shall not be held liable for improper or incorrect use of the data described and/or contained herein. These data and related graphics are not legal documents and are not intended to be used as such. The information contained in these data is dynamic and may change over time. The data are not better than the original sources from which they were derived. It is the responsibility of the data user to use the data appropriately and consistent within the limitations of geospatial data in general and these data in particular. The related graphics are intended to aid the data user in acquiring relevant data; it is not appropriate to use the related graphics as data. The National Park Service gives no warranty, expressed or implied, as to the accuracy, reliability, or completeness of these data. It is strongly recommended that these data are directly acquired from an NPS server and not indirectly through other sources which may have changed the data in some way. Although these data have been processed successfully on a computer system at the National Park Service, no warranty expressed or implied is made regarding the utility of the data on another system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty. This disclaimer applies both to individual use of the data and aggregate use with other data.

  12. nps boundary centroids

    • public-nps.opendata.arcgis.com
    • gis.data.mass.gov
    • +2more
    Updated Dec 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2019). nps boundary centroids [Dataset]. https://public-nps.opendata.arcgis.com/maps/nps::nps-boundary-centroids-2
    Explore at:
    Dataset updated
    Dec 12, 2019
    Dataset authored and provided by
    National Park Servicehttp://www.nps.gov/
    Area covered
    North Pacific Ocean, Pacific Ocean
    Description

    This service depicts National Park Service tract and boundary data that was created by the Land Resources Division. NPS Director's Order #25 states: "Land status maps will be prepared to identify the ownership of the lands within the authorized boundaries of the park unit. These maps, showing ownership and acreage, are the 'official record' of the acreage of Federal and non-federal lands within the park boundaries. While these maps are the official record of the lands and acreage within the unit's authorized boundaries, they are not of survey quality and not intended to be used for survey purposes." As such this data is intended for use as a tool for GIS analysis. It is in no way intended for engineering or legal purposes. The data accuracy is checked against best available sources which may be dated and vary by location. NPS assumes no liability for use of this data. The boundary polygons represent the current legislated boundary of a given NPS unit. NPS does not necessarily have full fee ownership or hold another interest (easement, right of way, etc...) in all parcels contained within this boundary. Equivalently NPS may own or have an interest in parcels outside the legislated boundary of a given unit. In order to obtain complete information about current NPS interests both inside and outside a unit’s legislated boundary tract level polygons are also created by NPS Land Resources Division and should be used in conjunction with this boundary data. To download this data directly from the NPS go to https://irma.nps.gov Property ownership data is compiled from deeds, plats, surveys, and other source data. These are not engineering quality drawings and should be used for administrative purposes only. The National Park Service (NPS) shall not be held liable for improper or incorrect use of the data described and/or contained herein. These data and related graphics are not legal documents and are not intended to be used as such. The information contained in these data is dynamic and may change over time. The data are not better than the original sources from which they were derived. It is the responsibility of the data user to use the data appropriately and consistent within the limitations of geospatial data in general and these data in particular. The related graphics are intended to aid the data user in acquiring relevant data; it is not appropriate to use the related graphics as data. The National Park Service gives no warranty, expressed or implied, as to the accuracy, reliability, or completeness of these data. It is strongly recommended that these data are directly acquired from an NPS server and not indirectly through other sources which may have changed the data in some way. Although these data have been processed successfully on a computer system at the National Park Service, no warranty expressed or implied is made regarding the utility of the data on another system or for general or scientific purposes, nor shall the act of distribution constitute any such warranty. This disclaimer applies both to individual use of the data and aggregate use with other data.

  13. Prospect Data | Manufacturing Sector in North America | Comprehensive...

    • datarade.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Success.ai, Prospect Data | Manufacturing Sector in North America | Comprehensive Firmographic Insights | Best Price Guaranteed [Dataset]. https://datarade.ai/data-products/prospect-data-manufacturing-sector-in-north-america-compr-success-ai
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset provided by
    Area covered
    Costa Rica, Mexico, Bermuda, Guatemala, Greenland, Nicaragua, Panama, United States of America, Canada, El Salvador, North America
    Description

    Success.ai’s Prospect Data for the Manufacturing Sector in North America provides businesses with a powerful dataset to connect with manufacturers and industry leaders across the United States, Canada, and Mexico. This dataset offers verified contact details, detailed firmographic insights, and business location data for companies in a wide range of manufacturing sectors, including automotive, electronics, consumer goods, industrial equipment, and more.

    With access to over 170 million verified professional profiles and 30 million company profiles, Success.ai ensures that your outreach, market research, and business development efforts are powered by accurate, continuously updated, and AI-validated data.

    Backed by our Best Price Guarantee, this solution is ideal for businesses looking to succeed in the dynamic North American manufacturing industry.

    Why Choose Success.ai’s Manufacturing Prospect Data?

    1. Verified Contact Data for Effective Outreach

      • Access verified work emails, phone numbers, and LinkedIn profiles of manufacturing executives, plant managers, procurement officers, and engineers.
      • AI-driven validation ensures 99% accuracy, reducing bounce rates and optimizing communication efficiency.
    2. Regional Focus on North American Manufacturing

      • Includes profiles of manufacturers across key markets such as the U.S., Canada, and Mexico, covering diverse sectors and specialties.
      • Gain insights into regional industry trends, operational practices, and supply chain dynamics unique to North America.
    3. Continuously Updated Datasets

      • Real-time updates reflect leadership changes, production expansions, market shifts, and operational improvements.
      • Stay aligned with the rapidly evolving manufacturing sector to identify opportunities and maintain relevance.
    4. Ethical and Compliant

      • Adheres to GDPR, CCPA, and other global privacy regulations, ensuring responsible and compliant use of data for your campaigns.

    Data Highlights:

    • 170M+ Verified Professional Profiles: Engage with decision-makers, engineers, and operational leaders driving manufacturing innovation across North America.
    • 30M Company Profiles: Access firmographic data, including revenue ranges, workforce sizes, and geographic locations.
    • Verified Leadership Contacts: Connect directly with CEOs, COOs, plant managers, and procurement leads shaping manufacturing operations.
    • Business Insights: Understand production capacities, supply chain networks, and technology adoption rates.

    Key Features of the Dataset:

    1. Manufacturing Decision-Maker Profiles

      • Identify and connect with leaders responsible for vendor selection, process optimization, and technology integration.
      • Target professionals making decisions on resource allocation, production planning, and quality control.
    2. Firmographic and Geographic Data

      • Access detailed business information, including company sizes, production capacities, and geographic footprints.
      • Pinpoint manufacturing hubs, regional facilities, and distribution centers to enhance supply chain strategies.
    3. Advanced Filters for Precision Targeting

      • Filter companies by industry segment (automotive, electronics, consumer goods), geographic location, company size, or revenue range.
      • Align campaigns with specific manufacturing needs, such as sustainability, cost optimization, or digital transformation.
    4. AI-Driven Enrichment

      • Profiles enriched with actionable data allow you to craft personalized messaging, highlight unique value propositions, and improve engagement with manufacturing stakeholders.

    Strategic Use Cases:

    1. Sales and Lead Generation

      • Present products, services, or technologies designed to enhance manufacturing efficiency, cost savings, or production quality.
      • Build relationships with procurement managers, operations directors, and plant supervisors.
    2. Market Research and Competitive Analysis

      • Analyze trends in North American manufacturing to identify growth opportunities, emerging markets, and industry challenges.
      • Benchmark against competitors to refine product offerings, pricing models, and go-to-market strategies.
    3. Supply Chain and Vendor Development

      • Engage with manufacturers seeking reliable suppliers, logistics partners, or raw material sources.
      • Position your company as a strategic partner for supply chain optimization and sustainability initiatives.
    4. Technology Integration and Innovation

      • Target R&D professionals and operations leaders exploring robotics, AI, IoT, or automation tools.
      • Offer solutions to support digital transformation, improve production scalability, or enhance workforce productivity.

    Why Choose Success.ai?

    1. Best Price Guarantee
      • Access premium-quality manufacturing data at competitive prices, ensuring maximum ROI for your outreach and strategic initiatives....
  14. JOSSE: A Software Development Effort Dataset Annotated with Expert Estimates...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Aug 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohammed Alhamed; Tim Storer; Mohammed Alhamed; Tim Storer (2022). JOSSE: A Software Development Effort Dataset Annotated with Expert Estimates [Dataset]. http://doi.org/10.5281/zenodo.7022735
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mohammed Alhamed; Tim Storer; Mohammed Alhamed; Tim Storer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The JIRA Open-Source Software Effort (JOSSE) dataset consists of software development and maintenance tasks collected from the JIRA issue tracking system for Apache, JBoss, And Spring open-source projects. All the issues were annotated with actual effort and 19% of them were annotated with expert estimates. JOSSE is a task-based dataset with a textual attribute represented as a task description for each data point. This paper explains how the data were collected and details six data quality refinement procedures of the data points.

  15. n

    Final Continuous Quality Improvement of Current Efficiency Research Data

    • narcis.nl
    • data.mendeley.com
    Updated Feb 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Moongo, T (via Mendeley Data) (2020). Final Continuous Quality Improvement of Current Efficiency Research Data [Dataset]. http://doi.org/10.17632/r3hf2n9tf9.1
    Explore at:
    Dataset updated
    Feb 4, 2020
    Dataset provided by
    Data Archiving and Networked Services (DANS)
    Authors
    Moongo, T (via Mendeley Data)
    Description

    An excel spreadsheet for the raw data used for a master of industrial engineering thesis titled designing a continuous quality improvement framework for improving electrowinning current efficiency. The thesis was written by Thomas Moongo 2020.

  16. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  17. n

    AIRS/Aqua L1B Infrared (IR) quality assurance subset V005 (AIRIBQAP) at GES...

    • cmr.earthdata.nasa.gov
    • data.nasa.gov
    • +3more
    Updated Feb 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). AIRS/Aqua L1B Infrared (IR) quality assurance subset V005 (AIRIBQAP) at GES DISC [Dataset]. http://doi.org/10.5067/VV622VHH2LXV
    Explore at:
    Dataset updated
    Feb 20, 2020
    Time period covered
    Aug 30, 2002 - Present
    Area covered
    Earth
    Description

    The Atmospheric Infrared Sounder (AIRS) is a grating spectrometer (R = 1200) aboard the second Earth Observing System (EOS) polar-orbiting platform, EOS Aqua. In combination with the Advanced Microwave Sounding Unit (AMSU) and the Humidity Sounder for Brazil (HSB), AIRS constitutes an innovative atmospheric sounding group of visible, infrared, and microwave sensors. The AIRS IR Level 1B QA Subset contains Quality Assurance (QA) parameters that a user of may use to filter AIRS IR Level 1B radiance data to create a subset of analysis. QA parameters indicate quality of granule-per-channel, scan-per-channel, field of view, and channel and should be accessed before any data of analysis. It also contains "glintlat", "glintlon", and "sun_glint_distant" that users can use to check for possibility of solar glint contamination.

  18. d

    Data from: Hypermedia-based software architecture enables test-driven...

    • search.dataone.org
    • data.niaid.nih.gov
    • +2more
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Post; Nancy Ho; Erik Rasmussen; Ivan Post; Aika Cho; John Hofer; Arthur Maness; Timothy Parnell; David Nix (2023). Hypermedia-based software architecture enables test-driven development [Dataset]. http://doi.org/10.5061/dryad.pvmcvdnrv
    Explore at:
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Dryad Digital Repository
    Authors
    Andrew Post; Nancy Ho; Erik Rasmussen; Ivan Post; Aika Cho; John Hofer; Arthur Maness; Timothy Parnell; David Nix
    Time period covered
    Jan 1, 2023
    Description

    Objectives: Using agile software development practices, develop and evaluate a software architecture and implementation for reliable management of bioinformatic data that is stored in the cloud. Materials and Methods: CORE (Comprehensive Oncology Research Environment) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Optimal testing completeness is a tradeoff between code coverage and software development costs. We hypothesized this architecture would permit developing tests that can be executed repeatedly for all microservices, maximizing code coverage while minimizing effort. Results: After one-and-a-half years of development, the CORE Browser backe...

  19. Z

    Data from: Machine Learning for Software Engineering: A Tertiary Study

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Galanopoulou, Rafaila (2022). Machine Learning for Software Engineering: A Tertiary Study [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5715474
    Explore at:
    Dataset updated
    Sep 16, 2022
    Dataset provided by
    Galanopoulou, Rafaila
    Kotti, Zoe
    Spinellis, Diomidis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset of the research paper: Machine Learning for Software Engineering: A Tertiary Study

    Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities. We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009–2022, covering 6,117 primary studies. The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML. We propose a number of ML for SE research challenges and actions including: conducting further empirical validation and industrial studies on ML; reconsidering deficient SE methods; documenting and automating data collection and pipeline processes; reexamining how industrial practitioners distribute their proprietary data; and implementing incremental ML approaches.

    The following data and source files are included.

    review-protocol.md: The protocol employed in this tertiary study

    data/

    dl-search/

    input/
    

    acm_comput_surveys_overviews.bib: Surveys of ACM Computing Surveys journal

    acm_comput_surveys_overviews_titles.txt: Titles of surveys

    acm_comput_ml_surveys.bib: Machine learning (ML)-related surveys of ACM Computing Surveys journal

    acm_comput_ml_surveys_titles.txt: Titles of ML-related surveys

    dl_search_queries.txt: Search queries applied to IEEE Xplore, ACM Digital Library, and Elsevier Scopus

    ml_keywords.txt: ML-related keywords extracted from ML-related survey titles and used in the search queries

    se_keywords.txt: Software Engineering (SE)-related keywords derived from the 15 SWEBOK Knowledge Areas (KAs—except for Computing Foundations, Mathematical Foundations, and Engineering Foundations) and used in the search queries

    secondary_studies_keywords.txt: Survey-related keywords composed of the 15 keywords introduced in the tertiary study on SLRs in SE by Kitchenham et al. (2010), and the survey titles, and used in the search queries

    output/
    

    acm/

    acm{1–9}.bib: Search results from ACM Digital Library

    ieee.csv: Search results from IEEE Xplore

    scopus_analyze_year.csv: Yearly distribution of ML and SE documents extracted from Scopus's Analyze search results page

    scopus.csv: Search results from Scopus

    study-selection/

    backward_snowballing.csv: Additional secondary studies found through the backward snowballing process

    backward_snowballing_references.csv: References of quality-accepted secondary studies

    cohen_kappa_agreement.csv: Inter-rater reliability of reviewers in study selection

    dl_search_results.csv: Aggregated search results of all three digital libraries

    forward_snowballing_reviewer_{1,2}.csv: Divided forward snowballing citations of quality-accepted studies assessed by reviewer 1 and 2, correspondingly, based on IC/EC

    study_selection_reviewer_{1,2}.csv: Divided search results assessed by reviewer 1 and 2, correspondingly, based on IC/EC

    quality-assessment/

    dare_assessment.csv: Quality assessment (QA) of selected secondary studies based on the Database of Abstracts of Reviews of Effects (DARE) criteria by York University, Centre for Reviews and Dissemination

    quality_accepted_studies.csv: Details of quality-accepted studies

    studies_for_review.bib: Bibliography details and QA scores of selected secondary studies

    data-extraction/

    further_research.csv: Recommendations for further research of quality-accepted studies

    further_research_general.csv: The complete list of associated studies for each general recommendation

    knowledge_areas.csv: Classification of quality-accepted studies using the SWEBOK KAs and subareas

    ml_techniques.csv: Classification of the quality-accepted studies based on a four-axis ML classification scheme, along with extracted ML techniques employed in the studies

    primary_studies.csv: Details of reviewed primary studies by the quality-accepted secondary

    research_methods.csv: Citations of the research methods employed by the quality-accepted studies

    research_types_methods.csv: Research types and methods employed by the quality-accepted studies

    src/

    data-analysis.ipynb: Analysis of data extraction results (data preprocessing, top authors and institutions, study types, yearly distribution of publishers, QA scores, and SWEBOK KAs) and creation of all figures included in the study

    scopus-year-analysis.ipynb: Yearly distribution of ML and SE publications retrieved from Elsevier Scopus

    study-selection-preprocessing.ipynb: Processing of digital library search results to conduct the inter-rater reliability estimation and study selection process

  20. Data from: Dataset for "Indoor environment quality and work performance in...

    • search.datacite.org
    • researchdata.bath.ac.uk
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rana Elnaklah (2020). Dataset for "Indoor environment quality and work performance in ‘green’ office buildings in the Middle East" [Dataset]. http://doi.org/10.15125/bath-00863
    Explore at:
    Dataset updated
    2020
    Dataset provided by
    DataCitehttps://www.datacite.org/
    University of Bath
    Authors
    Rana Elnaklah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    Al-Ahliyya Amman University
    Engineering and Physical Sciences Research Council (EPSRC)
    Description

    This data was collected from 13 office buildings in Amman Jordan. It covers five green 'LEED' buildings and eight conventional buildings. The dataset includes the following: 1. Objective data of four thermal conditions indicators (air temperature, mean radiant temperature, relative humidity and air speed) and an indoor air quality indicator: carbon dioxide concentration level. 2. Subjective data include the post-occupancy evaluation (POE) and absenteeism and presenteeism data. 3. Thermal comfort indicators included thermal sensation votes, thermal preference vote, and predictive mean vote.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gabriele Bavota; Andrea De Lucia; Massimiliano Di Penta; Rocco Oliveto; Fabio Palomba (2016). An Experimental Investigation on the Innate Relationship between Quality and Refactoring [Dataset]. http://doi.org/10.6084/m9.figshare.1207916.v5
Organization logo

Data from: An Experimental Investigation on the Innate Relationship between Quality and Refactoring

Related Article
Explore at:
zipAvailable download formats
Dataset updated
May 10, 2016
Dataset provided by
figshare
Authors
Gabriele Bavota; Andrea De Lucia; Massimiliano Di Penta; Rocco Oliveto; Fabio Palomba
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells. Nevertheless, the existing literature lacks of observations about the relations between metrics/code smells and refactoring operations performed by de- velopers. In other words, the characteristics of code components pushing developers to refactor them are still unknown. This paper aims at bridging this gap by analyzing which code characteristics trigger the developers refactoring attentions. Specifically, we mined the evolution history of three Java open source projects to investigate whether developers refactoring activities occur on code components for which cer- tain indicators—such as quality metrics or the presence of smells as detected by tools—suggest there might be need for refactoring operations. Results indicate that, more often than not, quality metrics do not show a clear relationship with refactoring. In other words, refactoring operations performed by developers are generally focused on code components for which quality metrics do not suggest there might be need for refactoring operations. Finally, 42% of refactoring operations are performed on code entities affected by code smells. However, only 7% of the performed operations actually remove the code smells from the affected class.

Search
Clear search
Close search
Google apps
Main menu