Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.
Facebook
TwitterWithin many real-world networks, the links between pairs of nodes change over time. Thus, there has been a recent boom in studying temporal graphs. Recognizing patterns in temporal graphs requires a proximity measure to compare different temporal graphs. To this end, we propose to study dynamic time warping on temporal graphs. We define the dynamic temporal graph warping (dtgw) distance to determine the dissimilarity of two temporal graphs. Our novel measure is flexible and can be applied in various application domains. We show that computing the dtgw-distance is a challenging (in general) NP -hard optimization problem and identify some polynomial-time solvable special cases. Moreover, we develop a quadratic programming formulation and an efficient heuristic. In experiments on real-world data, we show that the heuristic performs very well and that our dtgw-distance performs favorably in de-anonymizing networks compared to other approaches.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
After participating in an afterschool program where they used the Common Online Data Analysis Platform (CODAP) to study time-series data about infectious diseases, four middle school students were interviewed to determine how they understood features of and trends within these graphs. Our focus was on how students compared graphs. Students were readily able to compare cumulative/total infection rates among two countries with differently sized populations. It was more challenging for them to link a graph of yearly cases to the corresponding graph of cumulative cases. Students offered reasonable interpretations for spikes or steady periods in the graphs. Time-series graphs are accessible for 11- to 14-year-old students, who were able to make comparisons within and between graphs. Students used proportional reasoning for one comparison task, and on the other task, while it was challenging, they were beginning to understand how yearly and cumulative graphs were related. Time-series graphs are ubiquitous and socially relevant: Students should study time-series data more regularly in school, and more research is needed on the progression of sense-making with these graphs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
LauNuts is a RDF Knowledge Graph consisting of:
Local Administrative Units (LAU) and
Nomenclature of Territorial Units for Statistics (NUTS)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Benchmark results for "Mix-and-Match: A Model-driven Runtime Optimisation Strategy for BFS on GPUs" paper.
Performance data for Breadth-First Search on NVidia TitanX. Including trained Binary Decision Tree model for predicting the best implementation on an input graph.
Facebook
TwitterMatplotlib is a tremendous visualization library in Python for 2D plots of arrays. Matplotlib may be a multi-platform data visualization library built on NumPy arrays and designed to figure with the broader SciPy stack. It had been introduced by John Hunter within the year 2002.
A bar plot or bar graph may be a graph that represents the category of knowledge with rectangular bars with lengths and heights that’s proportional to the values which they represent. The bar plots are often plotted horizontally or vertically.
A bar chart is a great way to compare categorical data across one or two dimensions. More often than not, it’s more interesting to compare values across two dimensions and for that, a grouped bar chart is needed.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).
These results were used in the following conference papers:
Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.
Citation. If you use our dataset or tool, please cite article [1] above.
@InProceedings{Mendonca2015, author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe}, title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament}, booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})}, year = {2015}, pages = {122-129}, address = {Karlskrona, SE}, publisher = {IEEE Publishing}, doi = {10.1109/ENIC.2015.25},}
-------------------------
Details. This archive contains the following folders:
-------------------------
License. These data are shared under a Creative Commons 0 license.
Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Measuring the quality of Question Answering (QA) systems is a crucial task to validate the results of novel approaches. However, there are already indicators of a reproducibility crisis as many published systems have used outdated datasets or use subsets of QA benchmarks, making it hard to compare results. We identified the following core problems: there is no standard data format, instead, proprietary data representations are used by the different partly inconsistent datasets; additionally, the characteristics of datasets are typically not reflected by the dataset maintainers nor by the system publishers. To overcome these problems, we established an ontology---Question Answering Dataset Ontology (QADO)---for representing the QA datasets in RDF. The following datasets were mapped into the ontology: the QALD series, LC-QuAD series, RuBQ series, ComplexWebQuestions, and Mintaka. Hence, the integrated data in QADO covers widely used datasets and multilinguality. Additionally, we did intensive analyses of the datasets to identify their characteristics to make it easier for researchers to identify specific research questions and to select well-defined subsets. The provided resource will enable the research community to improve the quality of their research and support the reproducibility of experiments.
Here, the mapping results of the QADO process, the SPARQL queries for data analytics, and the archived analytics results file are provided.
Up-to-date statistics can be created automatically by the script provided at the corresponding QADO GitHub RDFizer repository.
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
The Bridges of Pittsburgh is a highly interdisciplinary and collaborative public-facing project that pays homage both to an innovative, field-defining mathematical problem and to one of the defining features of our city. We proposed to discover how many of Pittsburgh’s 446 bridges could be traversed without crossing the same bridge twice, in the process addressing issues in processing crowdsourced GIS data, performing graph traversal with complex constraints, and using network analysis to compare communities formed by this road network to the historically-defined neighborhoods of Pittsburgh.This ZIP file contains an RStudio project, with package dependencies bundled via packrat (https://rstudio.github.io/packrat/).- The osmar/ directory contains OSM data, our processing code, and outputs used to generate the map at https://bridgesofpittsburgh.net - 2019_final_community_analysis/ contains code and derived datasets for the community analysis portion of the projectwar- The legacy/ directory contains experimental datasets and code from the earliest phase of this project, which were later superseded by the main pipeline in the osmar/ directory.Each directory contains further README.md files documenting their structure.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To accurately predict molecular properties, it is important to learn expressive molecular representations. Graph neural networks (GNNs) have made significant advances in this area, but they often face limitations like neighbors-explosion, under-reaching, oversmoothing, and oversquashing. Additionally, GNNs tend to have high computational costs due to their large number of parameters. These limitations emerge or increase when dealing with larger graphs or deeper GNN models. One potential solution is to simplify the molecular graph into a smaller, richer, and more informative one that is easier to train GNNs. Our proposed molecular graph coarsening framework called FunQG, uses Functional groups as building blocks to determine a molecule’s properties, based on a graph-theoretic concept called Quotient Graph. We show through experiments that the resulting informative graphs are much smaller than the original molecular graphs and are thus more suitable for training GNNs. We apply FunQG to popular molecular property prediction benchmarks and compare the performance of popular baseline GNNs on the resulting data sets to that of state-of-the-art baselines on the original data sets. Our experiments demonstrate that FunQG yields notable results on various data sets while dramatically reducing the number of parameters and computational costs. By utilizing functional groups, we can achieve an interpretable framework that indicates their significant role in determining the properties of molecular quotient graphs. Consequently, FunQG is a straightforward, computationally efficient, and generalizable solution for addressing the molecular representation learning problem.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.
The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.
Feb 2018 note: I have just updated the dataset to include data up to Feb 2018. I have also accounted for changes in the stocks on the S&P 500 index (RIP whole foods etc. etc.).
The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the all_stocks_5yr.csv and corresponding folder).
The folder individual_stocks_5yr contains files of data for individual stocks, labelled by their stock ticker name. The all_stocks_5yr.csv contains the same data, presented in a merged .csv file. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.
All the files have the following columns: Date - in format: yy-mm-dd
Open - price of the stock at market open (this is NYSE data so all in USD)
High - Highest price reached in the day
Low Close - Lowest price reached in the day
Volume - Number of shares traded
Name - the stock's ticker name
Due to volatility in google finance, for the newest version I have switched over to acquiring the data from The Investor's Exchange api, the simple script I use to do this is found here. Special thanks to Kaggle, Github, pandas_datareader and The Market.
This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!
Facebook
TwitterThe InterpretSELDM program is a graphical post processor designed to facilitate analysis and presentation of stormwater modeling results from the Stochastic Empirical Loading and Dilution Model (SELDM), which is a stormwater model developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration. SELDM simulates flows, concentrations, and loads in stormflows from upstream basins, the highway, best management practice outfalls, and in the receiving water downstream of a highway. SELDM is designed to transform complex scientific data into meaningful information about (1) the risk of adverse effects from stormwater runoff on receiving waters, (2) the potential need for mitigation measures, and (3) the potential effectiveness of management measures for reducing those risks. SELDM produces results in (relatively) easy-to-use tab delimited output files that are designed for use with spreadsheets and graphing packages. However, time is needed to learn, understand, and use the SELDM output formats. Also, the SELDM output requires post-processing to extract the specific information that commonly is of interest to the user (for example, the percentage of storms above a user-specified value). Because SELDM output files are comprehensive, the locations of specific output values may not be obvious to the novice user or the occasional model user who does not consult the detailed model documentation. The InterpretSELDM program was developed as a postprocessor to facilitate analysis and presentation of SELDM results. The program provides graphical results and tab-delimited text summaries from simulation results. InterpretSELDM provides data summaries in seconds. In comparison, manually extracting the same information from SELDM outputs could take minutes to hours. It has an easy-to-use graphical user interface designed to quickly extract dilution factors, constituent concentrations, annual loads, and annual yields from all analyses within a SELDM project. The program provides the methods necessary to create scatterplots and boxplots for the extracted results. Graphs are more effective than tabular data for evaluating and communicating risk-based information to technical and nontechnical audiences. Commonly used spreadsheets provide methods for generating graphs, but do not provide probability-plots or boxplots, which are useful for examining extreme stormflow, concentration, and load values. Probability plot axes are necessary for evaluating stormflow information because the extreme values commonly are the values of concern. Boxplots provide a simple visual summary of results that can be used to compare different simulation results. The graphs created by using the InterpretSELDM program can be copied and pasted into word processors, spreadsheets, drawing software, and other programs. The graphs also can be saved in commonly used image-file formats.
Facebook
TwitterDiffusion MRI tractography is the only noninvasive method to measure the structural connectome in humans. However, recent validation studies have revealed limitations of modern tractography approaches, which lead to significant mistracking caused in part by local uncertainties in fiber orientations that accumulate to produce larger errors for longer streamlines. Characterizing the role of this length bias in tractography is complicated by the true underlying contribution of spatial embedding to brain topology. In this work, we compare graphs constructed with ex vivo tractography data in mice and neural tracer data from the Allen Mouse Brain Connectivity Atlas to random geometric surrogate graphs which preserve the low-order distance effects from each modality in order to quantify the role of geometry in various network properties. We find that geometry plays a substantially larger role in determining the topology of graphs produced by tractography than graphs produced by tracers. Tractography underestimates weights at long distances compared to neural tracers, which leads tractography to place network hubs close to the geometric center of the brain, as do corresponding tractography-derived random geometric surrogates, while tracer graphs place hubs further into peripheral areas of the cortex. We also explore the role of spatial embedding in modular structure, network efficiency and other topological measures in both modalities. Throughout, we compare the use of two different tractography streamline node assignment strategies and find that the overall differences between tractography approaches are small relative to the differences between tractography- and tracer-derived graphs. These analyses help quantify geometric biases inherent to tractography and promote the use of geometric benchmarking in future tractography validation efforts.
Facebook
Twitterhttps://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18419/DARUS-4231
This dataset contains the supplementary materials to our publication "Collaborative Problem Solving in Mixed Reality: A Study on Visual Graph Analysis", where we report on a study we conducted. Please refer to publication for more details, also the abstract can be found at the end of this description. The dataset contains: The collection of graphs with layout used in the study The final, randomized experiment files used in the study The source code of the study prototype The collected, anonymized data in tabular form The code for the statistical analysis The Supplemental Materials PDF The documents used in the study procedure (English, Italian, German) Paper abstract: Problem solving is a composite cognitive process, invoking a number of cognitive mechanisms, such as perception and memory. Individuals may form collectives to solve a given problem together, in collaboration, especially when complexity is thought to be high. To determine if and when collaborative problem solving is desired, we must quantify collaboration first. For this, we investigate the practical virtue of collaborative problem solving. Using visual graph analysis, we perform a study with 72 participants in two countries and three languages. We compare ad hoc pairs to individuals and nominal pairs, solving two different tasks on graphs in visuospatial mixed reality. The average collaborating pair does not outdo its nominal counterpart, but it does have a significant trade-off against the individual: an ad hoc pair uses 1.46 more time to achieve 4.6% higher accuracy. We also use the concept of task instance complexity to quantify differences in complexity. As task instance complexity increases, these differences largely scale, though with two notable exceptions. With this study we show the importance of using nominal groups as benchmark in collaborative virtual environments research. We conclude that a mixed reality environment does not automatically imply superior collaboration.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
About Dataset Context Stock market data can be interesting to analyze and as a further incentive, strong predictive models can have large financial payoff. The amount of financial data on the web is seemingly endless. A large and well structured dataset on a wide array of companies can be hard to come by. Here I provide a dataset with historical stock prices (last 5 years) for all companies currently found on the S&P 500 index.
The script I used to acquire all of these .csv files can be found in this GitHub repository In the future if you wish for a more up to date dataset, this can be used to acquire new versions of the .csv files.
Feb 2018 note: I have just updated the dataset to include data up to Feb 2018. I have also accounted for changes in the stocks on the S&P 500 index (RIP whole foods etc. etc.).
Content The data is presented in a couple of formats to suit different individual's needs or computational limitations. I have included files containing 5 years of stock data (in the allstocks5yr.csv and corresponding folder).
The folder individualstocks5yr contains files of data for individual stocks, labelled by their stock ticker name. The allstocks5yr.csv contains the same data, presented in a merged .csv file. Depending on the intended use (graphing, modelling etc.) the user may prefer one of these given formats.
All the files have the following columns: Date - in format: yy-mm-dd
Open - price of the stock at market open (this is NYSE data so all in USD)
High - Highest price reached in the day
Low Close - Lowest price reached in the day
Volume - Number of shares traded
Name - the stock's ticker name
Acknowledgements Due to volatility in google finance, for the newest version I have switched over to acquiring the data from The Investor's Exchange api, the simple script I use to do this is found here. Special thanks to Kaggle, Github, pandas_datareader and The Market.
Inspiration This dataset lends itself to a some very interesting visualizations. One can look at simple things like how prices change over time, graph an compare multiple stocks at once, or generate and graph new metrics from the data provided. From these data informative stock stats such as volatility and moving averages can be easily calculated. The million dollar question is: can you develop a model that can beat the market and allow you to make statistically informed trades!
Facebook
TwitterComparison of Seconds to Output 500 Tokens, including reasoning model 'thinking' time; Lower is better by Model
Facebook
TwitterBy Gabe Salzer [source]
This dataset contains essential performance statistics for NBA rookies from 1980-2016. Here you can find minute per game stats, points scored, field goals made and attempted, three-pointers made and attempted, free throws made and attempted (with the respective percentages for each), offensive rebounds, defensive rebounds, assists, steals blocks turnovers efficiency rating and Hall of Fame induction year. It is organized in descending order by minutes played per game as well as draft year. This Kaggle dataset is an excellent resource for basketball analysts to gain a better understanding of how rookies have evolved over the years—from their stats to how they were inducted into the Hall of Fame. With its great detail on individual players' performance data this dataset allows you to compare their performances against different eras in NBA history along with overall trends in rookie statistics. Compare rookies drafted far apart or those that played together- whatever your goal may be!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is perfect for providing insight into the performance of NBA rookies over an extended period of time. The data covers rookie stats from 1980 to 2016 and includes statistics such as points scored, field goals made, free throw percentage, offensive rebounds, defensive rebounds and assists. It also provides the name of each rookie along with the year they were drafted and their Hall of Fame class.
This data set is useful for researching how rookies’ stats have changed over time in order to compare different eras or identify trends in player performance. It can also be used to evaluate players by comparing their stats against those of other players or previous years’ stats.
In order to use this dataset effectively, a few tips are helpful:
Consider using Field Goal Percentage (FG%), Three Point Percentage (3P%) and Free Throw Percentage (FT%) to measure a player’s efficiency beyond just points scored or field goals made/attempted (FGM/FGA).
Lookout for anomalies such as low efficiency ratings despite high minutes played as this could indicate that either a player has not had enough playing time in order for their statistics to reach what would be per game average when playing more minutes or that they simply did not play well over that short period with limited opportunities.
Try different visualizations with the data such as histograms, line graphs and scatter plots because each may offer different insights into varied aspects of the data set like comparison between individual years vs aggregate trends over multiple years etc.
Lastly it is important keep in mind whether you're dealing with cumulative totals over multiple seasons versus looking at individual season averages or per game numbers when attempting analysis on these sets!
- Evaluating the performance of historical NBA rookies over time and how this can help inform future draft picks in the NBA.
- Analysing the relative importance of certain performance stats, such as three-point percentage, to overall success and Hall of Fame induction from 1980-2016.
- Comparing rookie seasons across different years to identify common trends in terms of statistical contributions and development over time
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: NBA Rookies by Year_Hall of Fame Class.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | Name | The name of...
Facebook
TwitterBy Kiersten Rule [source]
This dataset provides insight into the performance of movie franchises, with detailed information on domestic earnings, ratings, and information on each movie. Featuring data from over a decade of films released in North America from 2005 - 2018, we've collected a wealth of data to help analyze the trends that have emerged over time. From film budgets to box-office grosses, vote averages to release dates - you can explore how various studios and movies have impacted the industry by mining this database. Analyze the success of your favorite franchises or compare different plots and themes across genres! So dive in and uncover what makes a movie franchise great!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Compare movie franchises within the same studio – Look at trends such as average runtime or budget over time or compare one franchise to another (e.g., Marvel vs DC).
Analyze box office results by rating – It can be useful to compare which types of movies draw better audiences by looking at their respective box office totals per rating (e.g., R-rated vs PG-13). This can help you decide which genres do better within certain ratings systems that may be beneficial in targeting an audience with a similar demographic.
Use data visualization techniques – Manipulate and visualize the data set with charts and graphs to gain valuable insights into how certain movie characteristics influence overall success (e.g., use bar graphs and scatter plots to look at relationships between release year/budget/runtime etc).
Utilize release date analysis - This dataset gives you comprehensive information about when different movies were released, so you can use this information to analyze whether there are any benefits targeting particular months/seasons or avoiding them altogether (e.g., does Christmas offer greater success than summer for family films?).
With these tips in mind, this dataset should provide helpful insights into an understanding of what factors contribute most significantly towards the success of both individual films and major movie franchises!
- Analyzing the correlation between movie budget and lifetime gross earnings to determine optimum budgets for certain types of movies.
- Tracking the average ratings and reviews over time to see if certain studios are consistently making quality films or if there is a decline in their ratings and reviews.
- Comparing movie release dates against viewer ratings, reviews and lifetime gross revenue over time to determine which months of the year are most lucrative for releasing movies
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: MovieFranchises.csv | Column name | Description | |:-------------------|:-----------------------------------------------------------------------| | Title | The title of the movie. (String) | | Lifetime Gross | The total amount of money the movie has earned domestically. (Integer) | | Year | The year the movie was released. (Integer) | | Studio | The studio that produced the movie. (String) | | Rating | The rating of the movie e.g. PG-13, R etc. (String) | | Runtime | The length of the movie in minutes. (Integer) | | Budget | The budget of the movie. (Integer) | | ReleaseDate | The date that the movie was released. (Date) | | VoteAvg | Average rating from users. (Float) | | VoteCount | Total number of votes from users. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Kiersten Rule.
Facebook
TwitterRedmob's Identity Graph Data helps you bring fragmented user data into one unified view. Built in-house and refreshed weekly, the mobile identity graph connects online and offline identifiers.
Designed for adtech platforms, brands, CRM, and CDP owners, Redmob enables cross-device audience tracking, deterministic identity resolution, and more precise attribution modeling across digital touchpoints.
Use cases
The Redmob Identity Graph is a mobile-centric database of linked identifiers that enables:
Key benefits:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.