Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive open source project metrics including contributor activity, popularity trends, development velocity, and security assessments for Holoviz Panel: Python Data Exploration & Web App Framework.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
This dataset provides a comprehensive, time-series record of the global COVID-19 pandemic, including daily counts of confirmed cases, deaths, and recoveries across multiple countries and regions. It is designed to support data scientists, researchers, and public health professionals in conducting exploratory data analysis, forecasting, and impact assessment studies related to the spread and consequences of the virus.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami (link), a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code (github link) used to annotate the figures.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV: https://mdv.molbiol.ox.ac.uk), a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system (https://doi.org/10.1083/jcb.202205129). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
Facebook
TwitterDescription: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.
What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.
Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.
Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.
Facebook
TwitterThe "Iris Flower Visualization using Python" project is a data science project that focuses on exploring and visualizing the famous Iris flower dataset. The Iris dataset is a well-known dataset in the field of machine learning and data science, containing measurements of four features (sepal length, sepal width, petal length, and petal width) for three different species of Iris flowers (Setosa, Versicolor, and Virginica).
In this project, Python is used as the primary programming language along with popular libraries such as pandas, matplotlib, seaborn, and plotly. The project aims to provide a comprehensive visual analysis of the Iris dataset, allowing users to gain insights into the relationships between the different features and the distinct characteristics of each Iris species.
The project begins by loading the Iris dataset into a pandas DataFrame, followed by data preprocessing and cleaning if necessary. Various visualization techniques are then applied to showcase the dataset's characteristics and patterns. The project includes the following visualizations:
1. Scatter Plot: Visualizes the relationship between two features, such as sepal length and sepal width, using points on a 2D plane. Different species are represented by different colors or markers, allowing for easy differentiation.
2. Pair Plot: Displays pairwise relationships between all features in the dataset. This matrix of scatter plots provides a quick overview of the relationships and distributions of the features.
3. Andrews Curves: Represents each sample as a curve, with the shape of the curve representing the corresponding Iris species. This visualization technique allows for the identification of distinct patterns and separability between species.
4. Parallel Coordinates: Plots each feature on a separate vertical axis and connects the values for each data sample using lines. This visualization technique helps in understanding the relative importance and range of each feature for different species.
5. 3D Scatter Plot: Creates a 3D plot with three features represented on the x, y, and z axes. This visualization allows for a more comprehensive understanding of the relationships between multiple features simultaneously.
Throughout the project, appropriate labels, titles, and color schemes are used to enhance the visualizations' interpretability. The interactive nature of some visualizations, such as the 3D Scatter Plot, allows users to rotate and zoom in on the plot for a more detailed examination.
The "Iris Flower Visualization using Python" project serves as an excellent example of how data visualization techniques can be applied to gain insights and understand the characteristics of a dataset. It provides a foundation for further analysis and exploration of the Iris dataset or similar datasets in the field of data science and machine learning.
Facebook
TwitterFrom a baby’s babbling to a songbird practicing a new tune, exploration is critical to motor learning. A hallmark of exploration is the emergence of random walk behaviour along solution manifolds, where successive motor actions are not independent but rather become serially dependent. Such exploratory random walk behaviour is ubiquitous across species, neural firing, gait patterns, and reaching behaviour. Past work has suggested that exploratory random walk behaviour arises from an accumulation of movement variability and a lack of error-based corrections. Here we test a fundamentally different idea—that reinforcement-based processes regulate random walk behaviour to promote continual motor exploration to maximize success. Across three human-reaching experiments, we manipulated the size of both the visually displayed target and an unseen reward zone, as well as the probability of reinforcement feedback. Our empirical and modelling results parsimoniously support the notion that explorato..., Data was collected using a Kinarm and processed using Kinarm's Matlab scripts. The output of the Matlab scripts was then processed using Python (3.8.13) and stored in custom Python objects. , , # Reinforcement-Based Processes Actively Regulate Motor Exploration Along Redundant Solution Manifolds
https://doi.org/10.5061/dryad.ngf1vhj10
All files are compressed using the Python package dill. Each file contains a custom Python object that has data attributes and analysis methods. For a complete list of methods and attributes, see Exploration_Subject.py in the repository https://github.com/CashabackLab/Exploration-Along-Solution-Manifolds-Data
Files can be read into a Python script via the class method "from_pickle" inside the Exploration_Subject class.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This project focuses on analyzing the S&P 500 companies using data analysis tools like Python (Pandas), SQL, and Power BI. The goal is to extract insights related to sectors, industries, locations, and more, and visualize them using dashboards.
Included Files:
sp500_cleaned.csv – Cleaned dataset used for analysis
sp500_analysis.ipynb – Jupyter Notebook (Python + SQL code)
dashboard_screenshot.png – Screenshot of Power BI dashboard
README.md – Summary of the project and key takeaways
This project demonstrates practical data cleaning, querying, and visualization skills.
Facebook
TwitterThis is a full distribution of a webapp for visualizing the World Color Survey using Mapper. The zip archive contains all files used, including the PyMapper library [MB]. It will distribute into the /var/www library of a file system, and includes a README-file with technical details for getting file permissions right. The app depends on a Python 2 installation (2.7 or later), with the colormath, matplotlib, numpy and scipy additional libraries. [MB] Daniel Müllner and Aravindakshan Babu, Python Mapper: An open-source toolchain for data exploration, analysis and visualization, 2013, URL http://danifold.net/mapper
Facebook
TwitterThis submission includes the final project report of the Snake River Plain Play Fairway Analysis project as well as a separate appendix for the final report. The final report outlines the application of Play Fairway Analysis (PFA) to geothermal exploration, specifically within the Snake River Plain volcanic province. The goals of the report are to use PFA to lower risk and cost of geothermal exploration and stimulate development of geothermal power resources in Idaho. Further use of this report could include the application of PFA for geothermal exploration throughout the geothermal industry. The report utilizes ArcGIS and Python for data analysis which used to developed a systematic workflow to automate data analysis. The appendix for the report includes ArcGIS maps and data compilation information regarding the report.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset of 92 valid eye tracking sessions of 25 participants working in Vscode and answering 15 different code understanding questions (e.g., what is the output, side effects, algorithmic complexity, concurrency etc.) on source code written in 3 programming langauges: Python, C++, C#.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.
Cyberseminar recording is available on Youtube at https://youtu.be/uQXuS1AB2M0
Facebook
TwitterLeaf area index (LAI) plays an important role in land-surface models to describe the energy, carbon, and water fluxes between the soil and canopy vegetation. Indirect ground LAI measurements, such as using the LAI2200C Plant Canopy Analyzer (PCA), can not only increase the measurement efficiency but also protect the vegetation compared with the direct and destructive ground LAI measurement. Additionally, indirect measurements provide opportunities for remote-sensing-based LAI monitoring. This project focuses on the extraction of several features observed using the LAI2200C PCA because the extracted features can help to explore the relationship between the ground measurements and remote sensing data. Although FV2200 software can provide convenient data calculation, data visualization, etc., it cannot generate features such as time, coordinates, and LAI from the data log for deeper exploration, especially when facing a large amount of collected data that needs to process. In order to increase efficiency, this project developed a simple python script for feature extraction, and demo data are provided.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project analyses the Animal Crossing: New Horizons Umbrellas dataset to explore patterns in umbrella properties such as colours, DIY status, prices, sizes, and sources. The analysis is performed using Python and SQL to provide clear insights and visualisations.
Dataset Credit: The original dataset was created and published by Jessica Li https://www.kaggle.com/datasets/jessicali9530/animal-crossing-new-horizons-nookplaza-dataset on Kaggle. It has been downloaded and used here for analysis purposes.
Purpose: This analysis helps understand trends in umbrella items within the game and demonstrates practical data analysis techniques combining Python and SQL.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TUCKER, Gregory E., CIRES & Department of Geological Sciences, University of Colorado, 2200 Colorado Ave, Boulder, CO 80309-0399; Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Campus Box 399, Boulder, CO 80309, HUTTON, Eric, Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Cam, Boulder, CO 80309 and PIPER, Mark, Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Campus Box 399, Boulder, CO 80309; Instaar, University of Colorado, campus Box 450, 1560 30th St, Boulder, CO 80303
Our planet’s surface is a restless place. Understanding the processes of weathering, erosion, and deposition that shape it is critical for applications ranging from short-term hazard analysis to long-term sedimentary stratigraphy and landscape/seascape evolution. Improved understanding requires computational models, which link process mechanics and chemistry to the observable geologic and geomorphic record. Historically, earth-surface process models have often been complex and difficult to work with. To help improve this situation and make the discovery process more efficient, the CSDMS Python Modeling Tool (PyMT) provides an environment in which community-built numerical models and tools can be initialized and run directly from a Python command line or Jupyter notebook. By equipping each model with a standardized set of command functions, known collectively as the Basic Model Interface (BMI), the task of learning and applying models becomes much easier. Using BMI functions, models can also be coupled together to explore dynamic feedbacks among different earth systems. To illustrate how PyMT works and the advantages it provides, we present an example that couples a terrestrial landscape evolution model (CHILD) with a marine sediment transport and stratigraphy model (SedFlux3D). Experiments with the resulting coupled model provide insights into how terrestrial “signals,” such as variations in mean precipitation, are recorded in deltaic stratigraphy. The example also illustrates the utility of PyMT’s tools, such as the ability to map variables between a regular rectilinear grid and an irregular triangulated grid. By simplifying the process of learning, operating, and coupling models, PyMT frees researchers to focus on exploring ideas, testing hypotheses, and comparing models with data.
Facebook
Twitterhttps://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
The dataset has been created specifically for practicing Python, NumPy, Pandas, and Matplotlib. It is designed to provide a hands-on learning experience in data manipulation, analysis, and visualization using these libraries.
Specifics of the Dataset:
The dataset consists of 5000 rows and 20 columns, representing various features with different data types and distributions. The features include numerical variables with continuous and discrete distributions, categorical variables with multiple categories, binary variables, and ordinal variables. Each feature has been generated using different probability distributions and parameters to introduce variations and simulate real-world data scenarios. The dataset is synthetic and does not represent any real-world data. It has been created solely for educational purposes.
One of the defining characteristics of this dataset is the intentional incorporation of various real-world data challenges:
Certain columns are randomly selected to be populated with NaN values, effectively simulating the common challenge of missing data. - The proportion of these missing values in each column varies randomly between 1% to 70%. - Statistical noise has been introduced in the dataset. For numerical values in some features, this noise adheres to a distribution with mean 0 and standard deviation 0.1. - Categorical noise is introduced in some features', with its categories randomly altered in about 1% of the rows. Outliers have also been embedded in the dataset, resonating with the Interquartile Range (IQR) rule
Context of the Dataset:
The dataset aims to provide a comprehensive playground for practicing Python, NumPy, Pandas, and Matplotlib. It allows learners to explore data manipulation techniques, perform statistical analysis, and create visualizations using the provided features. By working with this dataset, learners can gain hands-on experience in data cleaning, preprocessing, feature engineering, and visualization. Sources of the Dataset:
The dataset has been generated programmatically using Python's random number generation functions and probability distributions. No external sources or real-world data have been used in creating this dataset.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
We present a synergistic experimental–theoretical methodology for the investigation of lanthanide-based single-molecule magnets (SMMs), demonstrated using the example of novel heterometallic molecules incorporating Nd3+/Ce3+ ions combined with three different, rarely explored, pentacyanidocobaltate(III) metalloligands, [CoIII(CN)5(azido/nitrito-N/iodido)]3–. The theoretical part of our approach broadens the exploration of ab initio calculations for lanthanide(III) complexes toward the convenient simulations of such physical characteristics as directional dependences of Helmholtz energy, magnetization, susceptibility, and their thermal and field evolution, as well as light absorption and emission bands. This work was conducted using newly designed SlothPy software (https://slothpy.org). It is introduced as an open-source Python library for simulating various physical properties from first-principles based on results of electronic structure calculations obtained within popular quantum chemistry packages. The computational results were confronted with spectroscopic and ac/dc-magnetic data, the latter analyzed using previously designed relACs software. The combination of experimental and computational methods gave insight into phonon-assisted magnetic relaxation mechanisms, disentangling them from the temperature-independent quantum tunneling of magnetization and emphasizing the role of local-mode processes. This study provides an understanding of the changes in lanthanide(III) magnetic anisotropy introduced with pentacyanidocobaltates(III) modifications, theoretically exploring also potential applications of reported compounds as anisotropy switches or optical thermometers.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hydrological and meteorological information can help inform the conditions and risk factors related to the environment and their inhabitants. Due to the limitations of observation sampling, gridded data sets provide the modeled information for areas where data collection are infeasible using observations collected and known process relations. Although available, data users are faced with barriers to use, challenges like how to access, acquire, then analyze data for small watershed areas, when these datasets were produced for large, continental scale processes. In this tutorial, we introduce Observatory for Gridded Hydrometeorology (OGH) to resolve such hurdles in a use-case that incorporates NetCDF gridded data sets processes developed to interpret the findings and apply secondary modeling frameworks (landlab).
LEARNING OBJECTIVES - Familiarize with data management, metadata management, and analyses with gridded data - Inspecting and problem solving with Python libraries - Explore data architecture and processes - Learn about OGH Python Library - Discuss conceptual data engineering and science operations
Use-case operations: 1. Prepare computing environment 2. Get list of grid cells 3. NetCDF retrieval and clipping to a spatial extent 4. Extract NetCDF metadata and convert NetCDFs to 1D ASCII time-series files 5. Visualize the average monthly total precipitations 6. Apply summary values as modeling inputs 7. Visualize modeling outputs 8. Save results in a new HydroShare resource
For inquiries, issues, or contribute to the developments, please refer to https://github.com/freshwater-initiative/Observatory
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive open source project metrics including contributor activity, popularity trends, development velocity, and security assessments for Holoviz Panel: Python Data Exploration & Web App Framework.