This data set contains several forecast image products from the 20 member Canadian ensemble forecast system over North America. The products are available from the 00 and 12 UTC runs every 12 hours out to 204 hours. The products include 12 hour precipitation mean, 500 mb spaghetti plots of the 534 and 594 dam heights, 500 mb spaghetti plots of the 546 dam height, 500 mb spaghetti plots of the 558 dam height, and mean SLP and SLP centers. The imagery were developed by Environment Canada.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1 Code smell datasetIn order to create a high quality code smell datasets, we merged five different datasets. These datasets are among the largest and most accurate in our paper “Predicting Code Quality Attributes Based on Code Smells ”. Various software projects were analyzed automatically and manually to collect these labels. Table 1 shows the dataset details.Table 1. Merged datasets and their characteristics.DatasetSamplesProjectsCode smellsPalomba (2018) [1]40888395 versions of 30 open-source projectsLarge class, complex class, class data should be private, inappropriate intimacy, lazy class, middle man, refused equest, spaghetti code, speculative generality, comments, long method, long parameter list, feature envy, message chainsMadeyski [2]3291523 open-source and industrial projectsBlob, data classKhomh [3]_54 versions of 4 open-source projectsAnti-singleton, swiss army knifePecorelli [4]3419 open-source projectsBlobPalomba (2017) [5]_6 open-source projectsDispersed coupling, shotgun surgeryCode smell datasets have been prepared at two levels: class and method. The class level is 15 different smells as labels and 81 software metrics as features. As well, there are five smells and 31 metrics on the method level. This dataset contains samples of Java classes and methods. A sample can be identified by its longname, which contains the project-name, package-name, JavaFile-name, class-name, and method-name. The quantity of each smell ranges from 40 to 11000. The total number of samples is 37517, while the number of non-smells is nearly 3 million. As a result, our dataset is the largest in the study. You can see the details in Table 2.Table 2. The number of smells and non-smells at class and method levelsLevelMetricsSmellSamplesTotalClass81Complex class126523438Class data should be private1839Inappropriate intimacy780Large class990Lazy class774Middle man193Refused bequest1985Spaghetti code3203Speculative generality2723Blob988Data class938Anti-singleton2993Swiss army knife4601Dispersed coupling41Shotgun surgery125Non-smell40506 [3] +8334 [5] +296854 [1]+43862 [2] +55214 [4]444770Method31Comments10714079Feature envy525Long method11366Long parameter list1983Message chains98Non-smell246917624691762 Quality datasetThis dataset contains over 1000 Java project instances where for each instance the relative frequency of 20 code smells has been extracted along with the value of eight software quality attributes. The code quality dataset contains 20 smells as features and 8 quality attributes as labels: Coverageability, extendability, effectiveness, flexibility, functionality, reusability, testability, and understandability. The samples are Java projects identified by their name and version. Features are the ratio of smelly and non-smelly classes or methods in a software project. The quality attributes are a normalized score calculated by QMOOD metrics [6] and models extracted by [7], [8]. 1014 samples of small and large open-source and industrial projects are included in this dataset.The data samples are used to train machine learning models predicting software quality attributes based on code smells.References[1] F. Palomba, G. Bavota, M. Di Penta, F. Fasano, R. Oliveto, and A. De Lucia, “A large-scale empirical study on the lifecycle of code smell co-occurrences,” Inf Softw Technol, vol. 99, pp. 1–10, Jul. 2018, doi: 10.1016/J.INFSOF.2018.02.004.[2] L. Madeyski and T. Lewowski, “MLCQ: Industry-Relevant Code Smell Data Set,” in ACM International Conference Proceeding Series, Association for Computing Machinery, Apr. 2020, pp. 342–347. doi: 10.1145/3383219.3383264.[3] F. Khomh, M. Di Penta, Y. G. Guéhéneuc, and G. Antoniol, “An exploratory study of the impact of antipatterns on class change- and fault-proneness,” Empir Softw Eng, vol. 17, no. 3, pp. 243–275, Jun. 2012, doi: 10.1007/s10664-011-9171-y.[4] F. Pecorelli, F. Palomba, F. Khomh, and A. De Lucia, “Developer-Driven Code Smell Prioritization,” Proceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020, pp. 220–231, 2020, doi: 10.1145/3379597.3387457.[5] F. Palomba, M. Zanoni, F. A. Fontana, A. De Lucia, and R. Oliveto, “Smells like teen spirit: Improving bug prediction performance using the intensity of code smells,” in Proceedings - 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Institute of Electrical and Electronics Engineers Inc., Jan. 2017, pp. 244–255. doi: 10.1109/ICSME.2016.27.[6] J. Bansiya and C. G. Davis, “A hierarchical model for object-oriented design quality assessment,” IEEE Transactions on Software Engineering, vol. 28, no. 1, pp. 4–17, Jan. 2002, doi: 10.1109/32.979986.[7] M. Zakeri-Nasrabadi and S. Parsa, “Learning to predict test effectiveness,” International Journal of Intelligent Systems, 2021, doi: 10.1002/INT.22722.[8] M. Zakeri-Nasrabadi and S. Parsa, “Testability Prediction Dataset,” Mar. 2021, doi: 10.5281/ZENODO.4650228.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This data set contains several forecast image products from the 20 member Canadian ensemble forecast system over North America. The products are available from the 00 and 12 UTC runs every 12 hours out to 204 hours. The products include 12 hour precipitation mean, 500 mb spaghetti plots of the 534 and 594 dam heights, 500 mb spaghetti plots of the 546 dam height, 500 mb spaghetti plots of the 558 dam height, and mean SLP and SLP centers. The imagery were developed by Environment Canada.