MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The Data Analytics Market size was valued at USD 41.05 USD billion in 2023 and is projected to reach USD 222.39 USD billion by 2032, exhibiting a CAGR of 27.3 % during the forecast period. Data Analytics can be defined as the rigorous process of using tools and techniques within a computational framework to analyze various forms of data for the purpose of decision-making by the concerned organization. This is used in almost all fields such as health, money matters, product promotion, and transportation in order to manage businesses, foresee upcoming events, and improve customers’ satisfaction. Some of the principal forms of data analytics include descriptive, diagnostic, prognostic, as well as prescriptive analytics. Data gathering, data manipulation, analysis, and data representation are the major subtopics under this area. There are a lot of advantages of data analytics, and some of the most prominent include better decision making, productivity, and saving costs, as well as the identification of relationships and trends that people could be unaware of. The recent trends identified in the market include the use of AI and ML technologies and their applications, the use of big data, increased focus on real-time data processing, and concerns for data privacy. These developments are shaping and propelling the advancement and proliferation of data analysis functions and uses. Key drivers for this market are: Rising Demand for Edge Computing Likely to Boost Market Growth. Potential restraints include: Data Security Concerns to Impede the Market Progress . Notable trends are: Metadata-Driven Data Fabric Solutions to Expand Market Growth.
In this project, we aim to analyze and gain insights into the performance of students based on various factors that influence their academic achievements. We have collected data related to students' demographic information, family background, and their exam scores in different subjects.
**********Key Objectives:*********
Performance Evaluation: Evaluate and understand the academic performance of students by analyzing their scores in various subjects.
Identifying Underlying Factors: Investigate factors that might contribute to variations in student performance, such as parental education, family size, and student attendance.
Visualizing Insights: Create data visualizations to present the findings effectively and intuitively.
Dataset Details:
Analysis Highlights:
We will perform a comprehensive analysis of the dataset, including data cleaning, exploration, and visualization to gain insights into various aspects of student performance.
By employing statistical methods and machine learning techniques, we will determine the significant factors that affect student performance.
Why This Matters:
Understanding the factors that influence student performance is crucial for educators, policymakers, and parents. This analysis can help in making informed decisions to improve educational outcomes and provide support where it is most needed.
Acknowledgments:
We would like to express our gratitude to [mention any data sources or collaborators] for making this dataset available.
Please Note:
This project is meant for educational and analytical purposes. The dataset used is fictitious and does not represent any specific educational institution or individuals.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global data analytics in financial market size was valued at approximately USD 10.5 billion in 2023 and is projected to reach around USD 34.8 billion by 2032, growing at a robust CAGR of 14.4% during the forecast period. This remarkable growth is driven by the increasing adoption of advanced analytics technologies, the need for real-time data-driven decision-making, and the rising incidence of financial fraud.
One of the primary growth factors for the data analytics in the financial market is the burgeoning volume of data generated from diverse sources such as transactions, social media, and online banking. Financial institutions are increasingly leveraging data analytics to process and analyze this vast amount of data to gain actionable insights. Additionally, technological advancements in artificial intelligence (AI) and machine learning (ML) are significantly enhancing the capabilities of data analytics tools, enabling more accurate predictions and efficient risk management.
Another driving factor is the heightened focus on regulatory compliance and security management. In the wake of stringent regulations imposed by financial authorities globally, organizations are compelled to adopt robust analytics solutions to ensure compliance and mitigate risks. Moreover, with the growing threat of cyber-attacks and financial fraud, there is a heightened demand for sophisticated analytics tools capable of detecting and preventing fraudulent activities in real-time.
Furthermore, the increasing emphasis on customer-centric strategies in the financial sector is fueling the adoption of data analytics. Financial institutions are utilizing analytics to understand customer behavior, preferences, and needs more accurately. This enables them to offer personalized services, improve customer satisfaction, and drive revenue growth. The integration of advanced analytics in customer management processes helps in enhancing customer engagement and loyalty, which is crucial in the competitive financial landscape.
Regionally, North America has been the dominant player in the data analytics in financial market, owing to the presence of major market players, technological advancements, and a high adoption rate of analytics solutions. However, the Asia Pacific region is anticipated to witness the highest growth during the forecast period, driven by the rapid digitalization of financial services, increasing investments in analytics technologies, and the growing focus on enhancing customer experience in emerging economies like China and India.
In the data analytics in financial market, the components segment is divided into software and services. The software segment encompasses various analytics tools and platforms designed to process and analyze financial data. This segment holds a significant share in the market owing to the continuous advancements in software capabilities and the growing need for real-time analytics. Financial institutions are increasingly investing in sophisticated software solutions to enhance their data processing and analytical capabilities. The software segment is also being propelled by the integration of AI and ML technologies, which offer enhanced predictive analytics and automation features.
On the other hand, the services segment includes consulting, implementation, and maintenance services provided by vendors to help financial institutions effectively deploy and manage analytics solutions. With the rising complexity of financial data and analytics tools, the demand for professional services is on the rise. Organizations are seeking expert guidance to seamlessly integrate analytics solutions into their existing systems and optimize their use. The services segment is expected to grow significantly as more institutions recognize the value of professional support in maximizing the benefits of their analytics investments.
The software segment is further categorized into various types of analytics tools such as descriptive analytics, predictive analytics, and prescriptive analytics. Descriptive analytics tools are used to summarize historical data to identify patterns and trends. Predictive analytics tools leverage historical data to forecast future outcomes, which is crucial for risk management and fraud detection. Prescriptive analytics tools provide actionable recommendations based on predictive analysis, aiding in decision-making processes. The growing need for advanced predictive and prescriptive analytics is driving the demand for specialized software solut
This dataset provides geospatial location data and scripts used to analyze the relationship between MODIS-derived NDVI and solar and sensor angles in a pinyon-juniper ecosystem in Grand Canyon National Park. The data are provided in support of the following publication: "Solar and sensor geometry, not vegetation response, drive satellite NDVI phenology in widespread ecosystems of the western United States". The data and scripts allow users to replicate, test, or further explore results. The file GrcaScpnModisCellCenters.csv contains locations (latitude-longitude) of all the 250-m MODIS (MOD09GQ) cell centers associated with the Grand Canyon pinyon-juniper ecosystem that the Southern Colorado Plateau Network (SCPN) is monitoring through its land surface phenology and integrated upland monitoring programs. The file SolarSensorAngles.csv contains MODIS angle measurements for the pixel at the phenocam location plus a random 100 point subset of pixels within the GRCA-PJ ecosystem. The script files (folder: 'Code') consist of 1) a Google Earth Engine (GEE) script used to download MODIS data through the GEE javascript interface, and 2) a script used to calculate derived variables and to test relationships between solar and sensor angles and NDVI using the statistical software package 'R'. The file Fig_8_NdviSolarSensor.JPG shows NDVI dependence on solar and sensor geometry demonstrated for both a single pixel/year and for multiple pixels over time. (Left) MODIS NDVI versus solar-to-sensor angle for the Grand Canyon phenocam location in 2018, the year for which there is corresponding phenocam data. (Right) Modeled r-squared values by year for 100 randomly selected MODIS pixels in the SCPN-monitored Grand Canyon pinyon-juniper ecosystem. The model for forward-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle. The model for back-scatter MODIS-NDVI is log(NDVI) ~ solar-to-sensor angle + sensor zenith angle. Boxplots show interquartile ranges; whiskers extend to 10th and 90th percentiles. The horizontal line marking the average median value for forward-scatter r-squared (0.835) is nearly indistinguishable from the back-scatter line (0.833). The dataset folder also includes supplemental R-project and packrat files that allow the user to apply the workflow by opening a project that will use the same package versions used in this study (eg, .folders Rproj.user, and packrat, and files .RData, and PhenocamPR.Rproj). The empty folder GEE_DataAngles is included so that the user can save the data files from the Google Earth Engine scripts to this location, where they can then be incorporated into the r-processing scripts without needing to change folder names. To successfully use the packrat information to replicate the exact processing steps that were used, the user should refer to packrat documentation available at https://cran.r-project.org/web/packages/packrat/index.html and at https://www.rdocumentation.org/packages/packrat/versions/0.5.0. Alternatively, the user may also use the descriptive documentation phenopix package documentation, and description/references provided in the associated journal article to process the data to achieve the same results using newer packages or other software programs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Excel based tool was developed to analyze means-end chain data. The tool consists of a user manual, a data input file to correctly organise your MEC data, a calculator file to analyse your data, and instructional videos. The purpose of this tool is to aggregate laddering data into hierarchical value maps showing means-end chains. The summarized results consist of (1) a summary overview, (2) a matrix, and (3) output for copy/pasting into NodeXL to generate hierarchal value maps (HVMs). To use this tool, you must have collected data via laddering interviews. Ladders are codes linked together consisting of attributes, consequences and values (ACVs).
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global marketing data analysis software market is projected to grow from XXX million in 2023 to XXX million by 2033, with a CAGR of XX% during the forecast period. The growth of the market is attributed to the increasing adoption of data-driven marketing strategies by businesses to improve their customer engagement and sales performance. Additionally, the growing popularity of cloud-based software solutions and the availability of advanced analytical tools are driving the market growth. The market is segmented based on application, type, company, and region. The retail and e-commerce segment holds the largest market share due to the high demand for data analysis in the industry. The website analysis software segment is expected to witness significant growth during the forecast period due to the increasing need for businesses to track and analyze website traffic and behavior. The North American region dominates the market, followed by Europe and Asia Pacific. The key players in the market are HubSpot, Semrush, Looker Data Sciences (Google), Insider, LeadsRx, SharpSpring, OWOX BI, and Whatagraph BV.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global market size for Big Data Analysis Platforms is projected to grow from USD 35.5 billion in 2023 to an impressive USD 110.7 billion by 2032, reflecting a CAGR of 13.5%. This substantial growth can be attributed to the increasing adoption of data-driven decision-making processes across various industries, the rapid proliferation of IoT devices, and the ever-growing volumes of data generated globally.
One of the primary growth factors for the Big Data Analysis Platform market is the escalating need for businesses to derive actionable insights from complex and voluminous datasets. With the advent of technologies such as artificial intelligence and machine learning, organizations are increasingly leveraging big data analytics to enhance their operational efficiency, customer experience, and competitiveness. The ability to process vast amounts of data quickly and accurately is proving to be a game-changer, enabling businesses to make more informed decisions, predict market trends, and optimize their supply chains.
Another significant driver is the rise of digital transformation initiatives across various sectors. Companies are increasingly adopting digital technologies to improve their business processes and meet changing customer expectations. Big Data Analysis Platforms are central to these initiatives, providing the necessary tools to analyze and interpret data from diverse sources, including social media, customer transactions, and sensor data. This trend is particularly pronounced in sectors such as retail, healthcare, and BFSI (banking, financial services, and insurance), where data analytics is crucial for personalizing customer experiences, managing risks, and improving operational efficiencies.
Moreover, the growing adoption of cloud computing is significantly influencing the market. Cloud-based Big Data Analysis Platforms offer several advantages over traditional on-premises solutions, including scalability, flexibility, and cost-effectiveness. Businesses of all sizes are increasingly turning to cloud-based analytics solutions to handle their data processing needs. The ability to scale up or down based on demand, coupled with reduced infrastructure costs, makes cloud-based solutions particularly appealing to small and medium-sized enterprises (SMEs) that may not have the resources to invest in extensive on-premises infrastructure.
Data Science and Machine-Learning Platforms play a pivotal role in the evolution of Big Data Analysis Platforms. These platforms provide the necessary tools and frameworks for processing and analyzing vast datasets, enabling organizations to uncover hidden patterns and insights. By integrating data science techniques with machine learning algorithms, businesses can automate the analysis process, leading to more accurate predictions and efficient decision-making. This integration is particularly beneficial in sectors such as finance and healthcare, where the ability to quickly analyze complex data can lead to significant competitive advantages. As the demand for data-driven insights continues to grow, the role of data science and machine-learning platforms in enhancing big data analytics capabilities is becoming increasingly critical.
From a regional perspective, North America currently holds the largest market share, driven by the presence of major technology companies, high adoption rates of advanced technologies, and substantial investments in data analytics infrastructure. Europe and the Asia Pacific regions are also experiencing significant growth, fueled by increasing digitalization efforts and the rising importance of data analytics in business strategy. The Asia Pacific region, in particular, is expected to witness the highest CAGR during the forecast period, propelled by rapid economic growth, a burgeoning middle class, and increasing internet and smartphone penetration.
The Big Data Analysis Platform market can be broadly categorized into three components: Software, Hardware, and Services. The software segment includes analytics software, data management software, and visualization tools, which are crucial for analyzing and interpreting large datasets. This segment is expected to dominate the market due to the continuous advancements in analytics software and the increasing need for sophisticated data analysis tools. Analytics software enables organizations to process and analyze data from multiple sources,
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
When studying the impacts of climate change, there is a tendency to select climate data from a small set of arbitrary time periods or climate windows (e.g., spring temperature). However, these arbitrary windows may not encompass the strongest periods of climatic sensitivity and may lead to erroneous biological interpretations. Therefore, there is a need to consider a wider range of climate windows to better predict the impacts of future climate change. We introduce the R package climwin that provides a number of methods to test the effect of different climate windows on a chosen response variable and compare these windows to identify potential climate signals. climwin extracts the relevant data for each possible climate window and uses this data to fit a statistical model, the structure of which is chosen by the user. Models are then compared using an information criteria approach. This allows users to determine how well each window explains variation in the response variable and compare model support between windows. climwin also contains methods to detect type I and II errors, which are often a problem with this type of exploratory analysis. This article presents the statistical framework and technical details behind the climwin package and demonstrates the applicability of the method with a number of worked examples.
Envestnet®| Yodlee®'s Retail Transaction Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.
Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.
We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.
Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?
Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking
Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)
Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence
Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data release contains data in support of "Regional Analysis of the Dependence of Peak-Flow Quantiles on Climate with Application to Adjustment to Climate Trends" (Over and others, 2025). It contains input and output data used to analyze the effect of climate changes on trends in floods using three regression approaches. The input consists of two files. The first, "station_list.csv," contains streamgage information for the 404 streamgages considered for use in Over and others (2025). Only 330 of the 404 streamgages were considered non-redundant and used in the final analysis; these streamgages have a value of "Non-redundant" in the "redundancy_status" column. This file includes calibrated Monthly Water Balance Model (MWBM) parameters and basin characteristics. The second, "regression_input.csv," contains regression input data, including observed peak streamflow and precipitation. MWBM-simulated streamflow data was created using two sets of MWBM parameters: at-site calibrated p ...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The CETSA and Thermal Proteome Profiling (TPP) analytical methods are invaluable for the study of protein–ligand interactions and protein stability in a cellular context. These tools have increasingly been leveraged in work ranging from understanding signaling paradigms to drug discovery. Consequently, there is an important need to optimize the data analysis pipeline that is used to calculate protein melt temperatures (Tm) and relative melt shifts from proteomics abundance data. Here, we report a user-friendly analysis of the melt shift calculation workflow where we describe the impact of each individual calculation step on the final output list of stabilized and destabilized proteins. This report also includes a description of how key steps in the analysis workflow quantitatively impact the list of stabilized/destabilized proteins from an experiment. We applied our findings to develop a more optimized analysis workflow that illustrates the dramatic sensitivity of chosen calculation steps on the final list of reported proteins of interest in a study and have made the R based program Inflect available for research community use through the CRAN repository [McCracken, N. Inflect: Melt Curve Fitting and Melt Shift Analysis. R package version 1.0.3, 2021]. The Inflect outputs include melt curves for each protein which passes filtering criteria in addition to a data matrix which is directly compatible with downstream packages such as UpsetR for replicate comparisons and identification of biologically relevant changes. Overall, this work provides an essential resource for scientists as they analyze data from TPP and CETSA experiments and implement their own analysis pipelines geared toward specific applications.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Market Analysis for Data Middle Platform The global data middle platform market size was valued at USD 24.9 billion in 2025 and is anticipated to reach USD 76.1 billion by 2033, exhibiting a CAGR of 15.3% during the forecast period (2025-2033). Key drivers fueling market growth include the increasing adoption of cloud-based solutions, the proliferation of data, and the need for efficient data management. The rising adoption of data analytics and machine learning is also contributing to the demand for data middle platforms. The market is segmented by application (enterprise, municipal, bank, other) and type (local, cloud-based). The cloud-based segment dominates the market due to its cost-effectiveness, scalability, and flexibility. Key players in the market include Guangzhou Guangdian Information Technology Co., Ltd., Shanghai Qianjiang Network Technology Co., Ltd., Tianmian Information Technology (Shenzhen) Co., Ltd., Guangzhou Yunmi Technology Co., Ltd., Spot Technology, Xiamen Meiya Pico Information Co., Ltd., Star Ring Technology, Beijing Jiuqi Software Co., Ltd., LnData, SIE, Yusys Technology, and Sunline. The market is expected to experience significant growth in the Asia Pacific region, particularly in China, India, and Japan, due to the increasing number of data-driven businesses and the government's focus on digital transformation. The data middle platform market is a rapidly growing market, with a global value of $10.5 billion in 2021. The market is projected to grow at a CAGR of 15.7% over the next five years, reaching $24.5 billion by 2026. The growth of the market is being driven by the increasing adoption of data-driven decision-making in enterprises. As businesses become more reliant on data to improve their operations, they are increasingly investing in data middle platforms to manage and analyze their data.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global data modeling software market size was valued at approximately USD 2.5 billion in 2023 and is projected to reach around USD 6.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% from 2024 to 2032. The market's robust growth can be attributed to the increasing adoption of data-driven decision-making processes across various industries, which necessitates advanced data modeling solutions to manage and analyze large volumes of data efficiently.
The proliferation of big data and the growing need for data governance are significant drivers for the data modeling software market. Organizations are increasingly recognizing the importance of structured and unstructured data in generating valuable insights. With data volumes exploding, data modeling software becomes essential for creating logical data models that represent business processes and information requirements accurately. This software is crucial for implementation in data warehouses, analytics, and business intelligence applications, further fueling market growth.
Technological advancements, particularly in artificial intelligence (AI) and machine learning (ML), are also propelling the data modeling software market forward. These technologies enable more sophisticated data models that can predict trends, optimize operations, and enhance decision-making processes. The integration of AI and ML with data modeling tools allows for automated data analysis, reducing the time and effort required for manual processes and improving the accuracy of the results. This technological synergy is a significant growth factor for the market.
The rise of cloud-based solutions is another critical factor contributing to the market's expansion. Cloud deployment offers numerous advantages, such as scalability, flexibility, and cost-effectiveness, making it an attractive option for businesses of all sizes. Cloud-based data modeling software allows for real-time collaboration and access to data from anywhere, enhancing productivity and efficiency. As more companies move their operations to the cloud, the demand for cloud-compatible data modeling solutions is expected to surge, driving market growth further.
In terms of regional outlook, North America currently holds the largest share of the data modeling software market. This dominance is due to the high concentration of technology-driven enterprises and a strong emphasis on data analytics and business intelligence in the region. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period. Rapid digital transformation, increased cloud adoption, and the rising importance of data analytics in emerging economies like China and India are key factors contributing to this growth. Europe, Latin America, and the Middle East & Africa also present significant opportunities, albeit at varying growth rates.
In the data modeling software market, the component segment is divided into software and services. The software component is the most significant contributor to the market, driven by the increasing need for advanced data modeling tools that can handle complex data structures and provide accurate insights. Data modeling software includes various tools and platforms that facilitate the creation, management, and optimization of data models. These tools are essential for database design, data architecture, and other data management tasks, making them indispensable for organizations aiming to leverage their data assets effectively.
Within the software segment, there is a growing trend towards integrating AI and ML capabilities to enhance the functionality of data modeling tools. This integration allows for more sophisticated data analysis, automated model generation, and improved accuracy in predictions and insights. As a result, organizations can achieve better data governance, streamline operations, and make more informed decisions. The demand for such advanced software solutions is expected to rise, contributing significantly to the market's growth.
The services component, although smaller in comparison to the software segment, plays a crucial role in the data modeling software market. Services include consulting, implementation, training, and support, which are essential for the successful deployment and utilization of data modeling tools. Many organizations lack the in-house expertise to effectively implement and manage data modeling software, leading to increased demand for professional services.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The High Performance Computing (HPC) and High Performance Data Analytics (HPDA) Marketsize was valued at USD 46.01 USD billion in 2023 and is projected to reach USD 84.65 USD billion by 2032, exhibiting a CAGR of 9.1 % during the forecast period. High-Performance Computing (HPC) refers to the use of advanced computing systems and technologies to solve complex and large-scale computational problems at high speeds. HPC systems utilize powerful processors, large memory capacities, and high-speed interconnects to execute vast numbers of calculations rapidly. High-Performance Data Analytics (HPDA) extends this concept to handle and analyze big data with similar performance goals. HPDA encompasses techniques like data mining, machine learning, and statistical analysis to extract insights from massive datasets. Types of HPDA include batch processing, stream processing, and real-time analytics. Features include parallel processing, scalability, and high throughput. Applications span scientific research, financial modeling, and large-scale simulations, addressing challenges that require both intensive computing and sophisticated data analysis. Recent developments include: December 2023: Lenovo, a company offering computer hardware, software, and services, extended the HPC system “LISE” at the Zuse Institute Berlin (ZIB). This expansion would provide researchers at the institute with high computing power required to execute data-intensive applications. The major focus of this expansion is to enhance the energy efficiency of “LISE”. , August 2023: atNorth, a data center services company, announced the acquisition of Gompute, the HPC cloud platform offering Cloud HPC services, as well as on-premises and hybrid cloud solutions. Under the terms of the agreement, atNorth would add Gompute’s data center to its portfolio., July 2023: HCL Technologies Limited, a consulting and information technology services firm, extended its collaboration with Microsoft Corporation to provide HPC solutions, such as advanced analytics, ML, core infrastructure, and simulations, for clients across numerous sectors., June 2023: Leostream, a cloud-based desktop provider, launched new features designed to enhance HPC workloads on AWS EC2. The company develops zero-trust architecture around HPC workloads to deliver cost-effective and secure resources to users on virtual machines., November 2022: Intel Corporation, a global technology company, launched the latest advanced processors for HPC, artificial intelligence (AI), and supercomputing. These processors include data center version GPUs and 4th Gen Xeon Scalable CPUs.. Key drivers for this market are: Technological Advancements Coupled with Robust Government Investments to Fuel Market Growth. Potential restraints include: High Cost and Skill Gap to Restrain Industry Expansion. Notable trends are: Comprehensive Benefits Provided by Hybrid Cloud HPC Solutions to Aid Industry Expansion .
In the process of migrating data to the current DDL platform, datasets with a large number of variables required splitting into multiple spreadsheets. They should be reassembled by the user to understand the data fully. This is the second spreadsheet of six in the Using Leaf-level Hyperspectral Reflectance Data to Analyze Genetic Gain in CIMMYT Maize Hybrids, Leaf Level Reflectance Data in a Low Nitrogen Environment in Agua Fria.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Title: 9,565 Top-Rated Movies Dataset
Description:
This dataset offers a comprehensive collection of 9,565 of the highest-rated movies according to audience ratings on the Movie Database (TMDb). The dataset includes detailed information about each movie, such as its title, overview, release date, popularity score, average vote, and vote count. It is designed to be a valuable resource for anyone interested in exploring trends in popular cinema, analyzing factors that contribute to a movie’s success, or building recommendation engines.
Key Features:
- Title: The official title of each movie.
- Overview: A brief synopsis or description of the movie's plot.
- Release Date: The release date of the movie, formatted as YYYY-MM-DD
.
- Popularity: A score indicating the current popularity of the movie on TMDb, which can be used to gauge current interest.
- Vote Average: The average rating of the movie, based on user votes.
- Vote Count: The total number of votes the movie has received.
Data Source:
The data was sourced from the TMDb API, a well-regarded platform for movie information, using the /movie/top_rated
endpoint. The dataset represents a snapshot of the highest-rated movies as of the time of data collection.
Data Collection Process:
- API Access: Data was retrieved programmatically using TMDb’s API.
- Pagination Handling: Multiple API requests were made to cover all pages of top-rated movies, ensuring the dataset’s comprehensiveness.
- Data Aggregation: Collected data was aggregated into a single, unified dataset using the pandas
library.
- Cleaning: Basic data cleaning was performed to remove duplicates and handle missing or malformed data entries.
Potential Uses: - Trend Analysis: Analyze trends in movie ratings over time or compare ratings across different genres. - Recommendation Systems: Build and train models to recommend movies based on user preferences. - Sentiment Analysis: Perform text analysis on movie overviews to understand common themes and sentiments. - Statistical Analysis: Explore the relationship between popularity, vote count, and average ratings.
Data Format: The dataset is provided in a structured tabular format (e.g., CSV), making it easy to load into data analysis tools like Python, R, or Excel.
Usage License: The dataset is shared under [appropriate license], ensuring that it can be used for educational, research, or commercial purposes, with proper attribution to the data source (TMDb).
This description provides a clear and detailed overview, helping potential users understand the dataset's content, origin, and potential applications.
Streaming Analytics Market Size 2024-2028
The streaming analytics market size is forecast to increase by USD 39.7 at a CAGR of 34.63% between 2023 and 2028.
The market is experiencing significant growth due to the increasing need to improve business efficiency in various industries. The integration of Artificial Intelligence (AI) and Machine Learning (ML) technologies is a key trend driving market growth. These technologies enable real-time data processing and analysis, leading to faster decision-making and improved operational performance. However, the integration of streaming analytics solutions with legacy systems poses a challenge. IoT platforms play a crucial role In the market, as IoT-driven devices generate vast amounts of data that require real-time analysis. Predictive analytics is another area of focus, as it allows businesses to anticipate future trends and customer behavior, leading to proactive decision-making.Overall, the market is expected to continue growing, driven by the need for real-time data processing and analysis in various sectors.
What will be the Size of the Streaming Analytics Market During the Forecast Period?
Request Free Sample
The market is experiencing significant growth due to the increasing demand for real-time insights from big data generated by emerging technologies such as IoT and API-driven applications. This market is driven by the strategic shift towards digitization and cloud solutions among large enterprises and small to medium-sized businesses (SMEs) across various industries, including retail. Legacy systems are being replaced with modern streaming analytics platforms to enhance data connectivity and improve production and demand response. The financial impact of real-time analytics is substantial, with applications in fraud detection, predictive maintenance, and operational efficiency. The integration of artificial intelligence (AI) and machine learning algorithms further enhances the market's potential, enabling businesses to gain valuable insights from their data streams.Overall, the market is poised for continued expansion as more organizations recognize the value of real-time data processing and analysis.
How is this Streaming Analytics Industry segmented and which is the largest segment?
The streaming analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments. DeploymentCloudOn premisesTypeSoftwareServicesGeographyNorth AmericaCanadaUSAPACChinaJapanEuropeUKMiddle East and AfricaSouth America
By Deployment Insights
The cloud segment is estimated to witness significant growth during the forecast period.
Cloud-deployed streaming analytics solutions enable businesses to analyze data in real time using remote computing resources, such as the cloud. This deployment model streamlines business intelligence processes by collecting, integrating, and presenting derived insights instantaneously, enhancing decision-making efficiency. The cloud segment's growth is driven by benefits like quick deployment, flexibility, scalability, and real-time data visibility. Service providers offer these capabilities with flexible payment structures, including pay-as-you-go. Advanced solutions integrate AI, API, and event-streaming analytics capabilities, ensuring compliance with regulations, optimizing business processes, and providing valuable data accessibility. Cloud adoption in various sectors, including finance, healthcare, retail, and telecom, is increasing due to the need for real-time predictive modeling and fraud detection.SMEs and startups also benefit from these solutions due to their ease of use and cost-effectiveness. In conclusion, cloud-based streaming analytics solutions offer significant advantages, making them an essential tool for organizations seeking to digitize and modernize their IT infrastructure.
Get a glance at the Streaming Analytics Industry report of share of various segments Request Free Sample
The Cloud segment was valued at USD 4.40 in 2018 and showed a gradual increase during the forecast period.
Regional Analysis
APAC is estimated to contribute 34% to the growth of the global market during the forecast period.
Technavio’s analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.
For more insights on the market share of various regions, Request Free Sample
In North America, the region's early adoption of advanced technology and high data generation make it a significant market for streaming analytics. The vast amounts of data produced in this tech-mature region necessitate intelligent analysis to uncover valuable relationships and insights. Advanced software solutions, including AI, virtualiza
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,