Facebook
TwitterBeing relatively new to the field, electromechanical actuators in aerospace applications lack the knowledge base compared to ones accumulated for the other actuator types, especially when it comes to fault detection and characterization. Lack of health monitoring data from fielded systems and prohibitive costs of carrying out real flight tests push for the need of building system models and designing affordable but realistic experimental setups. This paper presents our approach to accomplish a comprehensive test environment equipped with fault injection and data collection capabilities. Efforts also include development of multiple models for EMA operations, both in nominal and fault conditions that can be used along with measurement data to generate effective diagnostic and prognostic estimates. A detailed description has been provided about how various failure modes are inserted in the test environment and corresponding data is collected to verify the physics based models under these failure modes that have been developed in parallel. A design of experiment study has been included to outline the details of experimental data collection. Furthermore, some ideas about how experimental results can be extended to real flight environments through actual flight tests and using real flight data have been presented. Finally, the roadmap leading from this effort towards developing successful prognostic algorithms for electromechanical actuators is discussed.*
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the digitized treatments in Plazi based on the original journal article Mondal, Sonia, Ganesh, S. R., Sethy, P. G. S., Raghunathan, C., Raha, Sujoy, Sarkar, Sagnik (2022): Redescriptions of the type specimens of synonymous nominal taxa of sea snakes (Serpentes: Elapidae: Hydrophis, Laticauda) at the Zoological Survey of India. Zootaxa 5169 (4): 301-321, DOI: 10.11646/zootaxa.5169.4.1
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Korea Retail Sales Index: Nominal: Department Store data was reported at 140.300 2010=100 in Dec 2017. This records an increase from the previous number of 140.000 2010=100 for Nov 2017. Korea Retail Sales Index: Nominal: Department Store data is updated monthly, averaging 78.950 2010=100 from Jan 1995 (Median) to Dec 2017, with 276 observations. The data reached an all-time high of 147.400 2010=100 in Nov 2013 and a record low of 28.800 2010=100 in Feb 1995. Korea Retail Sales Index: Nominal: Department Store data remains active status in CEIC and is reported by Statistics Korea. The data is categorized under Global Database’s South Korea – Table KR.H014: Retail Sales Index: 2010=100: By Business Type.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The survey dataset for identifying Shiraz old silo’s new use which includes four components: 1. The survey instrument used to collect the data “SurveyInstrument_table.pdf”. The survey instrument contains 18 main closed-ended questions in a table format. Two of these, concern information on Silo’s decision-makers and proposed new use followed up after a short introduction of the questionnaire, and others 16 (each can identify 3 variables) are related to the level of appropriate opinions for ideal intervention in Façade, Openings, Materials and Floor heights of the building in four values: Feasibility, Reversibility, Compatibility and Social Benefits. 2. The raw survey data “SurveyData.rar”. This file contains an Excel.xlsx and a SPSS.sav file. The survey data file contains 50 variables (12 for each of the four values separated by colour) and data from each of the 632 respondents. Answering each question in the survey was mandatory, therefor there are no blanks or non-responses in the dataset. In the .sav file, all variables were assigned with numeric type and nominal measurement level. More details about each variable can be found in the Variable View tab of this file. Additional variables were created by grouping or consolidating categories within each survey question for simpler analysis. These variables are listed in the last columns of the .xlsx file. 3. The analysed survey data “AnalysedData.rar”. This file contains 6 “SPSS Statistics Output Documents” which demonstrate statistical tests and analysis such as mean, correlation, automatic linear regression, reliability, frequencies, and descriptives. 4. The codebook “Codebook.rar”. The detailed SPSS “Codebook.pdf” alongside the simplified codebook as “VariableInformation_table.pdf” provides a comprehensive guide to all 50 variables in the survey data, including numerical codes for survey questions and response options. They serve as valuable resources for understanding the dataset, presenting dictionary information, and providing descriptive statistics, such as counts and percentages for categorical variables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As the field of human-computer interaction continues to evolve, there is a growing need for robust datasets that can enable the development of gesture recognition systems that operate reliably in diverse real-world scenarios. We present a radar-based gesture dataset, recorded using the BGT60TR13C XENSIV™ 60GHz Frequency Modulated Continuous Radar sensor to address this need. This dataset includes both nominal gestures and anomalous gestures, providing a diverse and challenging benchmark for understanding and improving gesture recognition systems.
The dataset contains a total of 49,000 gesture recordings, with 25,000 nominal gestures and 24,000 anomalous gestures. Each recording consists of 100 frames of raw radar data, accompanied by a label file that provides annotations for every individual frame in each gesture sequence. This frame-based annotation allows for high-resolution temporal analysis and evaluation.
The nominal gestures represent standard, correctly performed gestures. These gestures were collected to serve as the baseline for gesture recognition tasks. The details of the nominal data are as follows:
Gesture Types: The dataset includes five nominal gesture types:
Total Samples: 25,000 nominal gestures.
Participants: The nominal gestures were performed by 12 participants (p1 through p12).
Each nominal gesture has a corresponding label file that annotates every frame with the nominal gesture type, providing a detailed temporal profile for training and evaluation purposes.
The anomalous gestures represent deviations from the nominal gestures. These anomalies were designed to simulate real-world conditions in which gestures might be performed incorrectly, under varying speeds, or with modified execution patterns. The anomalous data introduces additional challenges for gesture recognition models, testing their ability to generalize and handle edge cases effectively.
Total Samples: 24,000 anomalous gestures.
Anomaly Types: The anomalous gestures include three distinct types of anomalies:
Participants: The anomalous gestures involved contributions from eight participants, including p1, p2, p6, p7, p9, p10, p11, and p12.
Locations: All anomalous gestures were collected in location e1 (a closed-space meeting room).
The radar system was configured with an operational frequency range spanning from 58.5 GHz to 62.5 GHz. This configuration provides a range resolution of 37.5 mm and the ability to resolve targets at a maximum range of 1.2 meters. For signal transmission, the radar employed a burst configuration comprising 32 chirps per burst with a frame rate of 33 Hz and a pulse repetition time of 300 µs.
The data for each user, categorized by location and anomaly type, is saved in compressed .npz files. Each .npz file contains key-value pairs for the data and its corresponding labels. The file naming convention is as follows:UserLabel_EnvironmentLabel(_AnomalyLabel).npy. For nominal gestures, the anomaly label is omitted.
The .npz file contains two primary keys:
inputs: Represents the raw radar data.targets: Refers to the corresponding label vector for the raw data.The raw radar data inputsis stored as a NumPy array with 5 dimensions, structured as follows:n_recordings x n_frames x n_antennas x n_chirps x n_samples, where:
n_recordings: The number of gesture sequence instances (i.e., recordings).n_frames: The frame length of each gesture (100 frames per gesture).n_antennas: The number of virtual antennas (3 antennas).n_chirps: The number of chirps per frame (32 chirps).n_samples: The number of samples per chirp (64 samples).The labels targetsare stored as a NumPy array with 2 dimensions, structured as follows:n_recordings x n_frames, where:
n_recordings: The number of gesture sequence instances (i.e., recordings).n_frames: The frame length of each gesture (100 frames per gesture).Each entry in the targets matrix corresponds to the frame-level label for the associated raw radar data in inputs.
The total size of the dataset is approximately 48.1 GB, provided as a compressed file named radar_dataset.zip.
The user labels are defined as follows:
p1: Malep2: Femalep3: Femalep4: Malep5: Malep6: Malep7: Malep8: Malep9: Malep10: Femalep11: Malep12: MaleThe environmental labels included in the dataset are defined as follows:
e1: Closed-space meeting roome2: Open-space office roome3: Librarye4: Kitchene5: Exercise roome6: BedroomThe anomaly labels included in the dataset are defined as follows:
fast: Fast gesture executionslow: Slow gesture executionwrist: Wrist gesture executionThis dataset represents a robust and diverse set of radar-based gesture data, enabling researchers and developers to explore novel models and evaluate their robustness in a variety of scenarios. The inclusion of frame-based labeling provides an additional level of detail to facilitate the design of advanced gesture recognition systems that can operate with high temporal resolution.
This dataset builds upon the version previously published on IEEE DataExplorer (https://ieee-dataport.org/documents/60-ghz-fmcw-radar-gesture-dataset), which included only one label per recording. In contrast, this version includes frame-based labels, providing individual annotations for each frame of the recorded gestures. By offering more granular labeling, this dataset further supports the development and evaluation of gesture recognition models with enhanced temporal precision. However, the raw radar data remains unchanged compared to the dataset available on IEEE DataExplorer.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is essentially the metadata from 164 datasets. Each of its lines concerns a dataset from which 22 features have been extracted, which are used to classify each dataset into one of the categories 0-Unmanaged, 2-INV, 3-SI, 4-NOA (DatasetType).
This Dataset consists of 164 Rows. Each row is the metadata of an other dataset. The target column is datasetType which has 4 values indicating the dataset type. These are:
2 - Invoice detail (INV): This dataset type is a special report (usually called Detailed Sales Statement) produced by a Company Accounting or an Enterprise Resource Planning software (ERP). Using a INV-type dataset directly for ARM is extremely convenient for users as it relieves them from the tedious work of transforming data into another more suitable form. INV-type data input typically includes a header but, only two of its attributes are essential for data mining. The first attribute serves as the grouping identifier creating a unique transaction (e.g., Invoice ID, Order Number), while the second attribute contains the items utilized for data mining (e.g., Product Code, Product Name, Product ID).
3 - Sparse Item (SI): This type is widespread in Association Rules Mining (ARM). It involves a header and a fixed number of columns. Each item corresponds to a column. Each row represents a transaction. The typical cell stores a value, usually one character in length, that depicts the presence or absence of the item in the corresponding transaction. The absence character must be identified or declared before the Association Rules Mining process takes place.
4 - Nominal Attributes (NOA): This type is commonly used in Machine Learning and Data Mining tasks. It involves a fixed number of columns. Each column registers nominal/categorical values. The presence of a header row is optional. However, in cases where no header is provided, there is a risk of extracting incorrect rules if similar values exist in different attributes of the dataset. The potential values for each attribute can vary.
0 - Unmanaged for ARM: On the other hand, not all datasets are suitable for extracting useful association rules or frequent item sets. For instance, datasets characterized predominantly by numerical features with arbitrary values, or datasets that involve fragmented or mixed types of data types. For such types of datasets, ARM processing becomes possible only by introducing a data discretization stage which in turn introduces information loss. Such types of datasets are not considered in the present treatise and they are termed (0) Unmanaged in the sequel.
The dataset type is crucial to determine for ARM, and the current dataset is used to classify the dataset's type using a Supervised Machine Learning Model.
There is and another dataset type named 1 - Market Basket List (MBL) where each dataset row is a transaction. A transaction involves a variable number of items. However, due to this characteristic, these datasets can be easily categorized using procedural programming and DoD does not include instances of them. For more details about Dataset Types please refer to article "WebApriori: a web application for association rules mining". https://link.springer.com/chapter/10.1007/978-3-030-49663-0_44
Facebook
TwitterDIC Data (0% Nominal Waviness)DIC Data for the 0% nominal waviness specimens. Data is exported from Istra-4D in a standardized file type.DIC_Data_0_nw.tarDIC Data (10% Nominal Waviness)DIC Data for the 10% nominal waviness specimens. Data is exported from Istra-4D in a standardized file type.DIC_Data_10_nw.tarDIC Data (15% and 17.5% Nominal Waviness)DIC Data for the 15% and 17.5% nominal waviness specimens. Data is exported from Istra-4D in a standardized file type.DIC_Data_15_and_17.5_nw.tarDIC Data (20% and 25% Nominal Waviness)DIC Data for the 15% and 17.5% nominal waviness specimens. Data is exported from Istra-4D in a standardized file type.DIC_Data_20_and_25_nw.tarLoad Data, Microscopy Images, and Ultrasound ImagesArchive containing: The load-displacement data for all specimens, recorded during load to failure. The raw microscopy images captured for sectioned specimens. The ultrasound images for all specimens. All file types are standardized.Load_Microscopy_Ultrasound.tar
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study examined the effect of model size on the chi-square test statistics obtained from ordinal factor analysis models. The performance of six robust chi-square test statistics were compared across various conditions, including number of observed variables (p), number of factors, sample size, model (mis)specification, number of categories, and threshold distribution. Results showed that the unweighted least squares (ULS) robust chi-square statistics generally outperform the diagonally weighted least squares (DWLS) robust chi-square statistics. The ULSM estimator performed the best overall. However, when fitting ordinal factor analysis models with a large number of observed variables and small sample size, the ULSM-based chi-square tests may yield empirical variances that are noticeably larger than the theoretical values and inflated Type I error rates. On the other hand, when the number of observed variables is very large, the mean- and variance-corrected chi-square test statistics (e.g., based on ULSMV and WLSMV) could produce empirical variances conspicuously smaller than the theoretical values and Type I error rates lower than the nominal level, and demonstrate lower power rates to reject misspecified models. Recommendations for applied researchers and future empirical studies involving large models are provided.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Overview
This dataset provides ready-to-use door-to-door public transport travel time estimates for each of the 2011 Census at the lower super output area (LSOA) and data zone (DZ) units (42,000 LSOA/DZ units in total) in Great Britain (GB) to every other reachable within 150 minutes during the morning peak for the year 2023 using. This information comprises an all-to-all travel time matrix (TTM) at the national level. The TTM are estimated for public transport, bicycle, and walking. Public transport estimates are estimated for two times of departure, specifically during the morning peak and at night. Altogether, these TTMs present a range of opportunities for researchers and practitioners, such as the development of accessibility measures, spatial connectivity, and the evaluation of public transport service changes throughout the day.
A full data descriptor is available in 'technical_note.html' file as part of the records of this repository.
Data records
The TTM structure follows a row matrix format, where each row represents a unique origin-destination pair. The TTMs are offered in a set of sequentially named .parquet files (more information about Parquet format at: https://parquet.apache.org/). The structure contains one directory for each mode, where ‘bike’, ‘pt’, and ‘walk’, correspond to bicycle, public transport, and walking, respectively.
Walking
The walking TTM contains 13.3 million rows and three columns. The table below offers a description of the columns.
Table 1: Walking travel time matrix codebook.
Variable Type Description
from_id nominal 2011 LSOA/DZ geo-code of origin
to_id nominal 2011 LSOA/DZ geo-code of destination
travel_time_p050 numeric Travel time walking in minutes
Bicycle
The bicycle TTM includes 40 million rows and four columns which are described in the table below.
Table 2: Bicycle travel time matrix codebook.
Variable Type Description
from_id nominal 2011 LSOA/DZ geo-code of origin
to_id nominal 2011 LSOA/DZ geo-code of destination
travel_time_p050 numeric Travel time by bicycle in minutes
travel_time_adj numeric Adjusted travel time by bicycle in minutes. This adds 5 minutes for locking to the unadjusted estimate.
Public transport
The LSOA/DZ TTM consists of six columns and 265 million rows. The internal structure of the records is displayed in the table below:
Table 3: Public transport LSOA/DZ travel time matrix codebook.
Variable Type Description
from_id nominal 2011 LSOA/DZ geo-code of origin
to_id nominal 2011 LSOA/DZ geo-code of destination
travel_time_p025 numeric 25 travel time percentile by public transport in minutes
travel_time_p050 numeric 50 travel time percentile by public transport in minutes
travel_time_p075 numeric 75 travel time percentile by public transport in minutes
time_of_day nominal A discrete value indicating the time of departure used. Levels: ‘am’ = 7 a.m.; ‘pm’ = 9 p.m.
Facebook
TwitterThis data contains the raw information about the use of "Marvel Cinematic Universe" keyword in the literature from Main Publishers, IEEE, ACM, MDPI, SPRINGER, SCIENCE DIRECT, PUBMED, TYLOR & FRANCES till the date 9/20/2023. This data has following five attributes with their data type. Paper title: Qualitative Publisher: Nominal Domain: Nominal Application Area: Nominal Purpose: Qualitative Outcome: Qualitative
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterNew York Energy Prices presents retail energy price data. Energy prices are provided by fuel type in nominal dollars per specified physical unit for the residential, commercial, industrial, and transportation sectors. This section includes a column in the price table displaying gross domestic product (GDP) price deflators for converting nominal (current year) dollars to constant (real) dollars. To convert nominal to constant dollars, divide the nominal energy price by the GDP price deflator for that particular year. Historical petroleum, electricity, coal, and natural gas prices were compiled primarily from the Energy Information Administration.
How does your organization use this dataset? What other NYSERDA or energy-related datasets would you like to see on Open NY? Let us know by emailing OpenNY@nyserda.ny.gov.
Facebook
TwitterOverview This public data was provided by Divvy, a Chicago-based bike share company through the Google Data Analytics Professional Certificate at Coursera. It includes data between August 2022 and July 2023. The Divvy Data License Agreement can be found here. The following CSV files are found in this dataset:
Data Dictionary
| Variable | Type | Definition |
|---|---|---|
| ride_id | text | unique trip identifier, not the customer identifier |
| rideable_type | text | type of equipment used |
| started_at | timestamp without timezone | date and time when service began |
| ended_at | timestamp without timezone | date and time when service terminated |
| start_station_name | text | nominal start location |
| start_station_id | text | start location identifier |
| end_station_name | text | nominal end location |
| end_station_id | text | end location identifier |
| start_lat | numeric | start location latitude |
| start_lng | numeric | end location longitude |
| end_lat | numeric | end location latitude |
| end_lng | numeric | end location longitude |
| member_casual | text | annual member or casual rider |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Korea RSI: Nominal: Specialized Stores data was reported at 104.400 2015=100 in May 2018. This records an increase from the previous number of 101.600 2015=100 for Apr 2018. Korea RSI: Nominal: Specialized Stores data is updated monthly, averaging 96.600 2015=100 from Jan 2005 (Median) to May 2018, with 161 observations. The data reached an all-time high of 110.900 2015=100 in Nov 2011 and a record low of 71.600 2015=100 in Aug 2005. Korea RSI: Nominal: Specialized Stores data remains active status in CEIC and is reported by Statistics Korea. The data is categorized under Global Database’s Korea – Table KR.H013: Retail Sales Index: 2015=100: By Business Type.
Facebook
TwitterBy UCI [source]
Welcome to the Restaurant and Consumer Data for Context-Aware dataset! This dataset was obtained from a recommender system prototype, where the task was to generate a top-n list of restaurants based on user preferences. The data represented here consists of two different approaches used: one through collaborative filtering techniques and another with a contextual approach.
In total, this dataset contains nine files instances, attributes, and associated information. The restaurants attributes comprise of chefmozaccepts.csv, chefmozcuisine.csv, chefmozhours4.csv and chefmozparking csv only while geoplaces2 csv has more additional attribute like alcohol served etc., Consumers attributes are usercuisine csv, userpayment csv ,userprofile ,and the final is rating_final which contains userID ,placeID ,rating ,food_rating & service_rating. It also details attribute information such as place ID (nominal), payment methods (nominal - cash/visa/mastercard/etc.), cuisine types (Nominal - Afghan/African/American etc.), hours of operation(nominal range :00:00-23:30) & days it operates (mon - sun). Further features include latitude n longitude for geospatial data representation; whether alcohol served or not; smoking area permission details /dress code stipulations; accessibility issues for disabled personel & website URLs for each restaurant along with their respective ratings . Consumer information includes smoker status (True/False ), dress preference(informal); marital status ; temperature recognition ; birth year ; interests foci plus budgeting expenditure restraints he must be aware off while selecting between various restaurant options available as per his individual desires .It is easy to discern interesting trends within this dataset that can provide useful insights in order to develop a contextaware recommender system accurately predicting customer choices efficiently unearthing optimal menu solutions tailored exclusively towards prospective diners & devouring aficinados globally!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
The dataset consists of 9 different files with various attributes related to each restaurant, consumer, and user/item/rating information. Each file is described in detail below:
- chefmozaccepts.csv: This file contains placeID and payment information for each restaurant in the dataset.
- chefmozcuisine.csv: This file contains place ID and cuisine type for each restaurant in the dataset.
- chefmozhours4.csv:This file contains the hours and days of operation for each restaurant in the dataset
- chefmozparking.csv: This file contains placeID along with parking lot information for each restaurant in the dataset 5) geoplaces2.csv: This file provides location data (latitude, longitude, etc.) for all restaurants in the dataset 6) rating_final.csvThis files contains userID along with ratings given by users on food quality, service quality, etc., at various places 7) usercuisinexsvThis files consists of user ID along with cuisine type preferences 8 )userpaymentxsvThis files provides payment method accepted by users at different places 9 )userprofilexsvThis files consistss of profile data like smoker status, activity level , budget ranges etc., coupled with users's IDs
Now that you have a better understanding of this our data set let’s go over some simple steps you can take to use it effectively :
Clean & Organize Your Data : Before using this data make sure you first clean it up by removing any duplicates or inconsistencies .You'll also want to parse it into a format that makes sense like creating a CSV /Json document or table so that when you run your algorithms they are easy to understand .
Analyze Your Data : Once you have had a chance to organize your data its time too analyze what insights we may be able find from our set . Use methods like clustering , statistical tests as well as machine learning techniques such as linear regression & random forest models too understand your datasets better . Step 3 - Generate Recommendations : With all these techniques now mastered its time too generate recommendationa system over our datasets which is exactly what this tutorial was created too do ! Utilize Collaborative Filtering & Content
- Analyzing restaurant customer preferences by creating a model that predicts the type of cuisine a customer is likely to order.
- Developing an algorithm to ide...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Median nominal and real salaries by different demographics time series 2007 - 2022(By gender, age group, and graduate type)
Facebook
Twitterhttps://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Singapore Department of Statistics. For more information, visit https://data.gov.sg/datasets/d_7875d532ad3307840a47163da0d7233b/view
Facebook
TwitterAbalone is common name for any group of small to very large sea snails, commonly found along the coasts across the world, and used as delicacy in cusinies and it's leftover shell is fashioned into jewelery due to it's iridescent luster. Due to it's demand and economic value it's often harvested in farms, and as such the need to predict the age of abalone from physical measurements. Traditional approach to determine it's age is by cutting the shell through the cone, staining it, and counting the number of rings through a microscope -- a boring and time-consuming task.
From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).
Number of instances: 4177
Number of attributes: 8
Features: Sex, Length, Diameter, Height, Whole weight, Shucked weight, Viscera weight, and Shell weight
Target: Rings
Note: Number of rings is the value to predict: either as a continuous value or it can be converted to classification problem.
Given below is attribute name, type, measurement, and brief description.
Name Data Type Meas. Description
----- --------- ----- -----------
Sex nominal M, F, and I (infant)
Length continuous mm Longest shell measurement
Diameter continuous mm perpendicular to length
Height continuous mm with meat in shell
Whole weight continuous grams whole abalone
Shucked weight continuous grams weight of meat
Viscera weight continuous grams gut weight (after bleeding)
Shell weight continuous grams after being dried
Rings integer +1.5 gives the age in years
NoneDataset comes from UCI Machine Learning repository: https://archive.ics.uci.edu/ml/datasets/Abalone
Facebook
TwitterBeing relatively new to the field, electromechanical actuators in aerospace applications lack the knowledge base compared to ones accumulated for the other actuator types, especially when it comes to fault detection and characterization. Lack of health monitoring data from fielded systems and prohibitive costs of carrying out real flight tests push for the need of building system models and designing affordable but realistic experimental setups. This paper presents our approach to accomplish a comprehensive test environment equipped with fault injection and data collection capabilities. Efforts also include development of multiple models for EMA operations, both in nominal and fault conditions that can be used along with measurement data to generate effective diagnostic and prognostic estimates. A detailed description has been provided about how various failure modes are inserted in the test environment and corresponding data is collected to verify the physics based models under these failure modes that have been developed in parallel. A design of experiment study has been included to outline the details of experimental data collection. Furthermore, some ideas about how experimental results can be extended to real flight environments through actual flight tests and using real flight data have been presented. Finally, the roadmap leading from this effort towards developing successful prognostic algorithms for electromechanical actuators is discussed.*