Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.
Facebook
TwitterThe "Iris Flower Visualization using Python" project is a data science project that focuses on exploring and visualizing the famous Iris flower dataset. The Iris dataset is a well-known dataset in the field of machine learning and data science, containing measurements of four features (sepal length, sepal width, petal length, and petal width) for three different species of Iris flowers (Setosa, Versicolor, and Virginica).
In this project, Python is used as the primary programming language along with popular libraries such as pandas, matplotlib, seaborn, and plotly. The project aims to provide a comprehensive visual analysis of the Iris dataset, allowing users to gain insights into the relationships between the different features and the distinct characteristics of each Iris species.
The project begins by loading the Iris dataset into a pandas DataFrame, followed by data preprocessing and cleaning if necessary. Various visualization techniques are then applied to showcase the dataset's characteristics and patterns. The project includes the following visualizations:
1. Scatter Plot: Visualizes the relationship between two features, such as sepal length and sepal width, using points on a 2D plane. Different species are represented by different colors or markers, allowing for easy differentiation.
2. Pair Plot: Displays pairwise relationships between all features in the dataset. This matrix of scatter plots provides a quick overview of the relationships and distributions of the features.
3. Andrews Curves: Represents each sample as a curve, with the shape of the curve representing the corresponding Iris species. This visualization technique allows for the identification of distinct patterns and separability between species.
4. Parallel Coordinates: Plots each feature on a separate vertical axis and connects the values for each data sample using lines. This visualization technique helps in understanding the relative importance and range of each feature for different species.
5. 3D Scatter Plot: Creates a 3D plot with three features represented on the x, y, and z axes. This visualization allows for a more comprehensive understanding of the relationships between multiple features simultaneously.
Throughout the project, appropriate labels, titles, and color schemes are used to enhance the visualizations' interpretability. The interactive nature of some visualizations, such as the 3D Scatter Plot, allows users to rotate and zoom in on the plot for a more detailed examination.
The "Iris Flower Visualization using Python" project serves as an excellent example of how data visualization techniques can be applied to gain insights and understand the characteristics of a dataset. It provides a foundation for further analysis and exploration of the Iris dataset or similar datasets in the field of data science and machine learning.
Facebook
TwitterDescription: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.
What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.
Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.
Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami (link), a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code (github link) used to annotate the figures.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive, time-series record of the global COVID-19 pandemic, including daily counts of confirmed cases, deaths, and recoveries across multiple countries and regions. It is designed to support data scientists, researchers, and public health professionals in conducting exploratory data analysis, forecasting, and impact assessment studies related to the spread and consequences of the virus.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV: https://mdv.molbiol.ox.ac.uk), a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system (https://doi.org/10.1083/jcb.202205129). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
Facebook
TwitterThe dataset is gathered on Sep. 17th 2020 from GitHub. It has more than 5.2K Python repositories and 4.2M type annotations. The dataset is also de-duplicated using the CD4Py tool. Check out the README.MD file for the description of the dataset. Notable changes to each version of the dataset are documented in CHANGELOG.md. The dataset's scripts and utilities are available on its GitHub repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present dataset include the SonarQube issues uncovered as part of our exploratory research targeting code complexity issues in junior developer code written in the Python or Java programming languages. The dataset also includes the actual rule configurations and thresholds used for the Python and Java languages during source code analysis.
Facebook
TwitterThis resource contains the environmental data (Stream Temperature) for different monitoring sites of the Logan River Observatory in the SQLite database management system. The monitoring sites with SiteIDs 1,2,3,9 and 10 of the Logan River Observatory are considered for the evaluation and visualization of monthly average stream temperature whose Variable ID is 1. The python code which is included in this resource is capable to access the database (SQLite) file and this retrieved data can be analyzed to examine the average monthly stream temperature at different monitoring sites of the Logan River Observatory.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.
Included Files:
sp500_cleaned.csv – Cleaned dataset used for analysis
sp500_analysis.ipynb – Jupyter Notebook (Python + SQL code)
dashboard_screenshot.png – Screenshot of Power BI dashboard
README.md – Summary of the project and key takeaways
This project demonstrates practical data cleaning, querying, and visualization skills.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.
This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.
In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Facebook
TwitterThis is a full distribution of a webapp for visualizing the World Color Survey using Mapper. The zip archive contains all files used, including the PyMapper library [MB]. It will distribute into the /var/www library of a file system, and includes a README-file with technical details for getting file permissions right. The app depends on a Python 2 installation (2.7 or later), with the colormath, matplotlib, numpy and scipy additional libraries. [MB] Daniel Müllner and Aravindakshan Babu, Python Mapper: An open-source toolchain for data exploration, analysis and visualization, 2013, URL http://danifold.net/mapper
Facebook
TwitterFrom a baby’s babbling to a songbird practicing a new tune, exploration is critical to motor learning. A hallmark of exploration is the emergence of random walk behaviour along solution manifolds, where successive motor actions are not independent but rather become serially dependent. Such exploratory random walk behaviour is ubiquitous across species, neural firing, gait patterns, and reaching behaviour. Past work has suggested that exploratory random walk behaviour arises from an accumulation of movement variability and a lack of error-based corrections. Here we test a fundamentally different idea—that reinforcement-based processes regulate random walk behaviour to promote continual motor exploration to maximize success. Across three human-reaching experiments, we manipulated the size of both the visually displayed target and an unseen reward zone, as well as the probability of reinforcement feedback. Our empirical and modelling results parsimoniously support the notion that explorato..., Data was collected using a Kinarm and processed using Kinarm's Matlab scripts. The output of the Matlab scripts was then processed using Python (3.8.13) and stored in custom Python objects. , , # Reinforcement-Based Processes Actively Regulate Motor Exploration Along Redundant Solution Manifolds
https://doi.org/10.5061/dryad.ngf1vhj10
All files are compressed using the Python package dill. Each file contains a custom Python object that has data attributes and analysis methods. For a complete list of methods and attributes, see Exploration_Subject.py in the repository https://github.com/CashabackLab/Exploration-Along-Solution-Manifolds-Data
Files can be read into a Python script via the class method "from_pickle" inside the Exploration_Subject class.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains the logs used to produce the results described in the publication Cooperative Robotic Exploration of a Planetary Skylight Surface and Lava Cave. Raúl Domínguez et. al. 2025
- CoRob_MP1_results.xlsx: Includes the log produced at the commanding station during the Mission Phase 1. It has been used to produce the results evaluation of the MP1.
- cmap.ply: Resulting map of the MP1.
- ground_truth_transformed_and_downsampled.ply: Ground truth map used for the evaluation of the cooperative map accuracy.
The dataset contains the samples used to generate the map provided as ground truth for the cave in the publication Cooperative Robotic Exploration of a Planetary Skylight Surface and Lava Cave. Raúl Domínguez et. al. 2025
The dataset has three parts. Between each of the parts, the data capture had to be interrupted. After each interruption, the position of the rover is not exactly the same as before the interruption. For that reason, it has been quite challenging to generate a full reconstruction using the three parts one after the other. In fact, the last one of the logs has not been filtered, since it was not possible to combine the different parts in a single SLAM reconstruction, the last part was not even pre-processed.
Each log contains:
- depthmaps, the raw LiDAR data from the Velodyne 32. Format: tiff.
- filtered_cloud, the pre-processed LiDAR data from the Velodyne 32. Format: ply.
- joint_states, the motor position values. Unfortunately the back axis passive joint is not included. Format: json.
- orientation_samples, the orientation as provided by the IMU sensor. Format: json.
- asguard_v4.urdf: In addition to the datasets, a geometrical robot model is provided which might be needed for environment reconstruction and pose estimation algorithms. Format: URDF.
├── 20211117-1112
│ ├── depth
│ │ └── depth_1637143958347198
│ ├── filtered_cloud
│ │ └── cloud_1637143958347198
│ ├── joint_states
│ │ └── joints_state_1637143957824829
│ └── orientation_samples
│ └── orientation_sample_1637143958005814
├── 20211117-1140
│ ├── depth
│ │ └── depth_1637145649108790
│ ├── filtered_cloud
│ │ └── cloud_1637145649108790
│ ├── joint_states
│ │ └── joints_state_1637145648630977
│ └── orientation_samples
│ └── orientation_sample_1637145648831795
└── 20211117-1205
├── depth
│ └── depth_1637147164030135
├── filtered_cloud
│ └── cloud_1637147164330388
├── joint_states
│ └── joints_state_1637147163501574
└── orientation_samples
└── orientation_sample_1637147163655187
- first_log_2cm_res_pointcloud-20231222.ply, contains the integrated pointcloud produced from the first of the logs.
The msgpack datasets can be imported using Python with the pocolog2msgpack library
The geometrical rover model of Coyote 3 is included in URDF format. It can be used in environment reconstruction algorithms which require the positions of the different sensors.
Includes exports of the log files used to compute the KPIs of the MP3.
These logs were used to obtain the KPI values for the MP4. It is composed of the following archives:
- log_coyote_02-03-2023_13-22_01-exp3.zip
- log_coyote_02-03-2023_13-22_01-exp4.zip
- log_coyote_02-09-2023_19-14_18_demo_skylight.zip
- log_coyote_02-09-2023_19-14_20_demo_teleop.zip
- coyote3_odometry_20230209-154158.0003_msgpacks.tar.gz
- coyote3_odometry_20230203-125251.0819_msgpacks.tar.gz
Two integrated pointclouds and one trajectory produced from logs captured by Coyote 3 inside the cave:
- Skylight_subsampled_mesh.ply
- teleop_tunnel_pointcloud.ply
- traj.ply
The repository https://github.com/Rauldg/corobx_dataset_scripts contains some example scripts which load some of the datasets.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of 92 valid eye tracking sessions of 25 participants working in Vscode and answering 15 different code understanding questions (e.g., what is the output, side effects, algorithmic complexity, concurrency etc.) on source code written in 3 programming langauges: Python, C++, C#.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this dataset is to provide a comprehensive collection of data that explores the recognition of tactile textures in dynamic exploration scenarios. The dataset was generated using a tactile-enabled finger with a multi-modal tactile sensing module. By incorporating data from pressure, gravity, angular rate, and magnetic field sensors, the dataset aims to facilitate research on machine learning methods for texture classification.
The data is stored in pickle files, which can be read using Panda’s library in Python. The data files are organized in a specific folder structure and contain multiple readings for each texture and exploratory velocity. The dataset contains raw data of the recorded tactile measurements for 12 different textures and 3 different exploratory velocities stored in pickle files.
Pickles_30 - Folder containing pickle files with tactile data at an exploratory velocity of 30 mm/s. Pickles_40 - Folder containing pickle files with tactile data at an exploratory velocity of 40 mm/s. Pickles_45 - Folder containing pickle files with tactile data at an exploratory velocity of 45 mm/s. Texture_01 to Texture_12 - Folders containing pickle files for each texture, labelled as texture_01, texture_02, and so on. Full_baro - Folder containing pickle files with barometer data for each texture. Full_imu - Folder containing pickle files with IMU (Inertial Measurement Unit) data for each texture.
The "reading-pickle-file.ipynb" file is a script for reading and plotting the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project analyses the Animal Crossing: New Horizons Umbrellas dataset to explore patterns in umbrella properties such as colours, DIY status, prices, sizes, and sources. The analysis is performed using Python and SQL to provide clear insights and visualisations.
Dataset Credit: The original dataset was created and published by Jessica Li https://www.kaggle.com/datasets/jessicali9530/animal-crossing-new-horizons-nookplaza-dataset on Kaggle. It has been downloaded and used here for analysis purposes.
Purpose: This analysis helps understand trends in umbrella items within the game and demonstrates practical data analysis techniques combining Python and SQL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Lunar Craters images and labels in COCO json format that will be used for the EXPLORE Expert Data Challenge 2022. More information at: https://exploredatachallenges.space/ Source dataset is derived from Watkins, Ryan (2019) Images were processed from NASA PDS raw data and labels extracted using python scripts.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
What is Pandas?
Pandas is a Python library used for working with data sets.
It has functions for analyzing, cleaning, exploring, and manipulating data.
The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas?
Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
What Can Pandas Do?
Pandas gives you answers about the data. Like:
Is there a correlation between two or more columns?
What is average value?
Max value?
Min value?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.