Facebook
Twitterklib library enables us to quickly visualize missing data, perform data cleaning, visualize data distribution plot, visualize correlation plot and visualize categorical column values. klib is a Python library for importing, cleaning, analyzing and preprocessing data. Explanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor).
Original Github repo
https://raw.githubusercontent.com/akanz1/klib/main/examples/images/header.png" alt="klib Header">
!pip install klib
import klib
import pandas as pd
df = pd.DataFrame(data)
# klib.describe functions for visualizing datasets
- klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features
- klib.corr_mat(df) # returns a color-encoded correlation matrix
- klib.corr_plot(df) # returns a color-encoded heatmap, ideal for correlations
- klib.dist_plot(df) # returns a distribution plot for every numeric feature
- klib.missingval_plot(df) # returns a figure containing information about missing values
Take a look at this starter notebook.
Further examples, as well as applications of the functions can be found here.
Pull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change. Take a look at this Github repo.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Bayesian network modeling (BN modeling, or BNM) is an interpretable machine learning method for constructing probabilistic graphical models from the data. In recent years, it has been extensively applied to diverse types of biomedical data sets. Concurrently, our ability to perform long-time scale molecular dynamics (MD) simulations on proteins and other materials has increased exponentially. However, the analysis of MD simulation trajectories has not been data-driven but rather dependent on the user’s prior knowledge of the systems, thus limiting the scope and utility of the MD simulations. Recently, we pioneered using BNM for analyzing the MD trajectories of protein complexes. The resulting BN models yield novel fully data-driven insights into the functional importance of the amino acid residues that modulate proteins’ function. In this report, we describe the BaNDyT software package that implements the BNM specifically attuned to the MD simulation trajectories data. We believe that BaNDyT is the first software package to include specialized and advanced features for analyzing MD simulation trajectories using a probabilistic graphical network model. We describe here the software’s uses, the methods associated with it, and a comprehensive Python interface to the underlying generalist BNM code. This provides a powerful and versatile mechanism for users to control the workflow. As an application example, we have utilized this methodology and associated software to study how membrane proteins, specifically the G protein-coupled receptors, selectively couple to G proteins. The software can be used for analyzing MD trajectories of any protein as well as polymeric materials.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset originates from a multi-year enterprise survey conducted across industries and countries. It focuses on the organizational effects of adopting Generative AI tools such as ChatGPT, Claude, Gemini, Mixtral, LLaMA, and Groq. The dataset captures detailed metrics on job role creation, workforce transformation, productivity changes, and employee sentiment.
columns = [
"Company Name", # Anonymized name
"Industry", # Sector (e.g., Finance, Healthcare)
"Country", # Country of operation
"GenAI Tool", # GenAI platform used
"Adoption Year", # Year of initial deployment (2022–2024)
"Number of Employees Impacted", # Affected staff count
"New Roles Created", # Number of AI-driven job roles introduced
"Training Hours Provided", # Upskilling time investment
"Productivity Change (%)", # % shift in reported productivity
"Employee Sentiment" # Textual feedback from employees
]
import pandas as pd
df = pd.read_csv("Large_Enterprise_GenAI_Adoption_Impact.csv")
df.shape
df.head(10)
df.describe()
df["GenAI Tool"].value_counts()
df["Industry"].unique()
df[(df["Adoption Year"] == 2023) & (df["Country"] == "India")]
df.groupby("Industry")["Productivity Change (%)"].mean().sort_values(ascending=False).head()
from collections import Counter
import re
text = " ".join(df["Employee Sentiment"].dropna().tolist())
words = re.findall(r'\b\w+\b', text.lower())
common_words = Counter(words).most_common(20)
print(common_words)
df["Sentiment Length"] = df["Employee Sentiment"].apply(lambda x: len(x.split()))
df["Sentiment Length"].hist(bins=50)
df.groupby("GenAI Tool")["New Roles Created"].mean().sort_values(ascending=False)
df.groupby("Industry")["Training Hours Provided"].mean().sort_values(ascending=False)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package
This repository contains data and source files needed to replicate our work described in the paper "Unboxing Default Argument Breaking Changes in Scikit Learn".
Requirements
We recommend the following requirements to replicate our study:
Package Structure
We relied on Docker containers to provide a working environment that is easier to replicate. Specifically, we configure the following containers:
data-analysis, an R-based Container we used to run our data analysis.data-collection, a Python Container we used to collect Scikit's default arguments and detect them in client applications.database, a Postgres Container we used to store clients' data, obtainer from Grotov et al.storage, a directory used to store the data processed in data-analysis and data-collection. This directory is shared in both containers.docker-compose.yml, the Docker file that configures all containers used in the package.In the remainder of this document, we describe how to set up each container properly.
Using VSCode to Setup the Package
We selected VSCode as the IDE of choice because its extensions allow us to implement our scripts directly inside the containers. In this package, we provide configuration parameters for both data-analysis and data-collection containers. This way you can directly access and run each container inside it without any specific configuration.
You first need to set up the containers
$ cd /replication/package/folder
$ docker-compose build
$ docker-compose up
# Wait docker creating and running all containers
Then, you can open them in Visual Studio Code:
If you want/need a more customized organization, the remainder of this file describes it in detail.
Longest Road: Manual Package Setup
Database Setup
The database container will automatically restore the dump in dump_matroskin.tar in its first launch. To set up and run the container, you should:
Build an image:
$ cd ./database
$ docker build --tag 'dabc-database' .
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
dabc-database latest b6f8af99c90d 50 minutes ago 18.5GB
Create and enter inside the container:
$ docker run -it --name dabc-database-1 dabc-database
$ docker exec -it dabc-database-1 /bin/bash
root# psql -U postgres -h localhost -d jupyter-notebooks
jupyter-notebooks=# \dt
List of relations
Schema | Name | Type | Owner
--------+-------------------+-------+-------
public | Cell | table | root
public | Code_cell | table | root
public | Md_cell | table | root
public | Notebook | table | root
public | Notebook_features | table | root
public | Notebook_metadata | table | root
public | repository | table | root
If you got the tables list as above, your database is properly setup.
It is important to mention that this database is extended from the one provided by Grotov et al.. Basically, we added three columns in the table Notebook_features (API_functions_calls, defined_functions_calls, andother_functions_calls) containing the function calls performed by each client in the database.
Data Collection Setup
This container is responsible for collecting the data to answer our research questions. It has the following structure:
dabcs.py, extract DABCs from Scikit Learn source code, and export them to a CSV file.dabcs-clients.py, extract function calls from clients and export them to a CSV file. We rely on a modified version of Matroskin to leverage the function calls. You can find the tool's source code in the `matroskin`` directory.Makefile, commands to set up and run both dabcs.py and dabcs-clients.pymatroskin, the directory containing the modified version of matroskin tool. We extended the library to collect the function calls performed on the client notebooks of Grotov's dataset.storage, a docker volume where the data-collection should save the exported data. This data will be used later in Data Analysis.requirements.txt, Python dependencies adopted in this module.Note that the container will automatically configure this module for you, e.g., install dependencies, configure matroskin, download scikit learn source code, etc. For this, you must run the following commands:
$ cd ./data-collection
$ docker build --tag "data-collection" .
$ docker run -it -d --name data-collection-1 -v $(pwd)/:/data-collection -v $(pwd)/../storage/:/data-collection/storage/ data-collection
$ docker exec -it data-collection-1 /bin/bash
$ ls
Dockerfile Makefile config.yml dabcs-clients.py dabcs.py matroskin storage requirements.txt utils.py
If you see project files, it means the container is configured accordingly.
Data Analysis Setup
We use this container to conduct the analysis over the data produced by the Data Collection container. It has the following structure:
dependencies.R, an R script containing the dependencies used in our data analysis.data-analysis.Rmd, the R notebook we used to perform our data analysisdatasets, a docker volume pointing to the storage directory.Execute the following commands to run this container:
$ cd ./data-analysis
$ docker build --tag "data-analysis" .
$ docker run -it -d --name data-analysis-1 -v $(pwd)/:/data-analysis -v $(pwd)/../storage/:/data-collection/datasets/ data-analysis
$ docker exec -it data-analysis-1 /bin/bash
$ ls
data-analysis.Rmd datasets dependencies.R Dockerfile figures Makefile
If you see project files, it means the container is configured accordingly.
A note on storage shared folder
As mentioned, the storage folder is mounted as a volume and shared between data-collection and data-analysis containers. We compressed the content of this folder due to space constraints. Therefore, before starting working on Data Collection or Data Analysis, make sure you extracted the compressed files. You can do this by running the Makefile inside storage folder.
$ make unzip # extract files
$ ls
clients-dabcs.csv clients-validation.csv dabcs.csv Makefile scikit-learn-versions.csv versions.csv
$ make zip # compress files
$ ls
csv-files.tar.gz Makefile
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was generated as part of the study aimed at profiling global scientific academies, which play a significant role in promoting scholarly communication and scientific progress. Below is a detailed description of the dataset:Data Generation Procedures and Tools: The dataset was compiled using a combination of web scraping, manual verification, and data integration from multiple sources, including Wikipedia categories,member of union of scientific organizations, and web searches using specific query phrases (e.g., "country name + (academy OR society) AND site:.country code"). The records were enriched by cross-referencing data from the Wikidata API, the VIAF API, and the Research Organisation Registry (ROR). Additional manual curation ensured accuracy and consistency.Temporal and Geographical Scopes: The dataset covers scientific academies from a wide temporal scope, ranging from the 15th century to the present. The geographical scope includes academies from all continents, with emphasis on both developed and post-developing countries. The dataset aims to capture the full spectrum of scientific academies across different periods of historical development.Tabular Data Description: The dataset comprises a total of 301 academy records and 14,008 website navigation sections. Each row in the dataset represents a single scientific academy, while the columns describe attributes such as the academy’s name, founding date, location (city and country), website URL, email, and address.Missing Data: Although the dataset offers comprehensive coverage, some entries may have missing or incomplete fields. For instance, section was not available for all records.Data Errors and Error Ranges: The data has been verified through manual curation, reducing the likelihood of errors. However, the use of crowd-sourced data from platforms like Wikipedia introduces potential risks of outdated or incomplete information. Any errors are likely minor and confined to fields such as navigation menu classifications, which may not fully reflect the breadth of an academy's activities.Data Files, Formats, and Sizes: The dataset is provided in CSV format and JSON format, ensuring compatibility with a wide range of software applications, including Microsoft Excel, Google Sheets, and programming languages such as Python (via libraries like pandas).This dataset provides a valuable resource for further research into the organizational behaviors, geographic distribution, and historical significance of scientific academies across the globe. It can be used for large-scale analyses, including comparative studies across different regions or time periods.Any feedback on the data is welcome! Please contact the maintaner of the dataset!If you use the data, please cite the following paper:Xiaoli Chen and Xuezhao Wang. 2024. Profiling Global Scientific Academies. In The 2024 ACM/IEEE Joint Conference on Digital Libraries (JCDL ’24), December 16–20, 2024, Hong Kong, China. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3677389.3702582
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The experiment is based on the common high speed milling data set to verify the robustness of the model to various tool types. The data set contains six sub data sets, corresponding to the wear process of six different types of tools. Three of the sub data sets contain tool wear labels, while the other three sub data sets do not. The tools used are all three edged 6mm ball cemented carbide tools, but their geometry and coating are different. The workpiece is Inconel 718, which is widely used for jet engine blade milling. The spindle speed is 10360rpm, and the cutting depth is 0.25mm. The tool cuts from the upper edge of the workpiece surface to the lower edge in a zigzag manner. In the whole milling process, the cutting length of each tool is about 0.1125m × 315pass = 35.44m. The cutting signal in Experiment 1 includes the cutting force signal collected by the three channel Kistler dynamometer and the vibration signal collected by the three channel Kistler accelerometer at a sampling rate of 50 kHz. Use the microscope LEICA MZ12 to measure the wear of the rear tool surface of the three teeth offline after each tool feeding. In this experiment, a cutting signal is collected every other period of time to predict the wear of the three teeth of the tool.The samples are divided into training set, evaluation set, test set and reconstruction set. The training set and evaluation set samples are from two kinds of tools, including 30000 and 4096 samples respectively; The samples of the test set are from another tool, including 9472 samples; The reconstruction set comes from the unlabeled data generated by the other three tools, including 40832 samples. Each sample contains three channels of cutting force signal and three channels of vibration signal. The sampling points of each channel signal are 2304. The following preprocessing steps are performed:1) Signal clippingSince the feed rate and sampling rate are constant throughout the experiment, the data set of each experiment can be approximately understood as a signal matrix evenly distributed on the workpiece surface, ignoring the slight difference in the number of sampling points for each tool path. The ordinate of the matrix corresponds to the index of the tool path times, and the abscissa corresponds to the index of the sampling point. Because the generation rules of cutting signals are different in uncut, cut in, cut out and stable states, the sampling points close to the edge of the workpiece are removed. Here we simply cut 2% off the two ends of the cutting signal obtained by each tool feed.2) Data amplificationBecause tool wear can only be observed with a microscope after each tool feeding, each wear tag corresponds to a cutting signal containing about 120000 sampling points, and the acquisition of tool wear also takes a lot of time. In this case, the number of tags extracted is not enough to fit the model, nor can the robustness of the algorithm be guaranteed. It is necessary to artificially split the sample and expand the tool wear label. Considering that the tool wear is a slow and continuous process, and there is a certain deviation in the experimental measurement, the linear interpolation method is adopted here. We also tested quadratic interpolation and polynomial fitting methods, but no better results were observed. It needs to be stated here that the essence of prediction is to find a function that maps the sample space to the target space. For any point in the sample space, the model can find the corresponding value in the target space. What sample amplification does is to sample more times in the target space, so as to more comprehensively describe this mapping relationship, rather than redefining this relationship.The task of this study is to monitor the wear of the rear cutter surface of the three teeth according to the six channel sensor signals. On the test set, the mean square error (MSE) and mean absolute percentage error (MAPE) between the predicted value and the observed value of the microscope are 0.0013 and 4%, respectively, and the average and maximum final prediction error (FPE) are 5 μ M and 23 μ m. The training time was 2130s, and the single prediction time was 1.79ms. The accuracy, training time and detection efficiency of tool wear monitoring can meet the current industrial needs. As MPAN realizes the mapping from cutting signal to tool wear, as the gate of control information flow, attention unit retains the importance information of input features. The predicted tool wear curve is basically consistent with the curve observed by the microscope.
Facebook
TwitterThis dataset contains: 1. An excel spreadsheet of field data from Tipperary pool, including CO2 bubble locations, raw and derived flux data, and field description. March 2017 field campaign. 2. Python scripts for two point correlation function, a spatial statistical method used to describe the spatial distribution of points, and applied to Tipperary pool CO2 bubbling points to determine geological control on their distribution. As reported in: Roberts, J.J., Leplastrier, A., Feitz, A., Bell, A., Karolyte, R., Shipton, Z.K. Structural controls on the location and distribution of CO2 leakage at a natural CO2 spring in Daylesford, Australia. IJGHGC.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterState-of-the-art open network visualization tools like Gephi, KeyLines, and Cytoscape are not suitable for studying street networks with thousands of roads since they do not support simultaneously polylines for edges, navigable maps, GPU-accelerated rendering, interactivity, and the means for visualizing multivariate data. To fill this gap, the present paper presents Dash Sylvereye: a new Python library to produce interactive visualizations of primal street networks on top of tiled web maps. Thanks to its integration with the Dash framework, Dash Sylvereye can be used to develop web dashboards around temporal and multivariate street data by coordinating the various elements of a Dash Sylvereye visualization with other plotting and UI components provided by the Dash framework. Additionally, Dash Sylvereye provides convenient functions to easily import OpenStreetMap street topologies obtained with the OSMnx library. Moreover, Dash Sylvereye uses WebGL for GPU-accelerated rendering when redrawing the road network. We conduct experiments to assess the performance of Dash Sylvereye on a commodity computer when exploiting software acceleration in terms of frames per second, CPU time, and frame duration. We show that Dash Sylvereye can offer fast panning speeds, close to 60 FPS, and CPU times below 20 ms, for street networks with thousands of edges, and above 24 FPS, and CPU times below 40 ms, for networks with dozens of thousands of edges. Additionally, we conduct a performance comparison against two state-of-the-art street visualization tools. We found Dash Sylvereye to be competitive when compared to the state-of-the-art visualization libraries Kepler.gl and city-roads. Finally, we describe a web dashboard application that exploits Dash Sylvereye for the analysis of a SUMO vehicle traffic simulation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GENERAL INFORMATION
Title of Dataset: Open data: The early but not the late neural correlate of auditory awareness reflects lateralized experiences.
Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se
B. Associate or Co-investigator Contact Information Name: Rasmus Eklund Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/raek2031-1.223133 Email: rasmus.eklund@psychology.su.se
C. Associate or Co-investigator Contact Information Name: Billy Gerdfeldter Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/bige1544-1.403208 Email: billy.gerdfeldter@psychology.su.se
Date of data collection: Subjects (N = 28) were tested between 2020-03-04 and 2020-09-18.
Geographic location of data collection: Department of Psychology, Stockholm, Sweden
Information about funding sources that supported the collection of the data: Marianne and Marcus Wallenberg (Grant 2019-0102)
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: CC BY 4.0
Links to publications that cite or use the data: Eklund R., Gerdfeldter B., & Wiens S. (2021). The early but not the late neural correlate of auditory awareness reflects lateralized experiences. Neuropsychologia. https://doi.org/
The study was preregistered: https://doi.org/10.17605/OSF.IO/PSRJF
Links to other publicly accessible locations of the data: N/A
Links/relationships to ancillary data sets: N/A
Was data derived from another source? No
Recommended citation for this dataset: Eklund R., Gerdfeldter B., & Wiens S. (2020). Open data: The early but not the late neural correlate of auditory awareness reflects lateralized experiences. Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.13067018
DATA & FILE OVERVIEW
File List: The files contain the downsampled data in bids format, scripts, and results of main and supplementary analyses of the electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.
AAN_LRclick_experiment_scripts.zip: contains the Python files to run the experiment
AAN_LRclick_bids_EEG.zip: contains EEG data files for each subject in .eeg format.
AAN_LRclick_behavior_log.zip: contains log files of the EEG session (generated by Python)
AAN_LRclick_EEG_scripts.zip: Python-MNE scripts to process and to analyze the EEG data
AAN_LRclick_results.zip: contains summary data files, figures, and tables that are created by Python-MNE.
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data: The auditory stimuli were 4-ms clicks. The experiment was programmed in Python: https://www.python.org/ and used extra functions from here: https://github.com/stamnosslin/mn The EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and converted to .eeg format. For more information, see linked publication.
Methods for processing the data: We computed event-related potentials. See linked publication
Instrument- or software-specific information needed to interpret the data: MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html#
Standards and calibration information, if appropriate: For information, see linked publication.
Environmental/experimental conditions: For information, see linked publication.
Describe any quality-assurance procedures performed on the data: For information, see linked publication.
People involved with sample collection, processing, analysis and/or submission:
DATA-SPECIFIC INFORMATION: All relevant information can be found in the MNE-Python scripts (in EEG_scripts folder) that process the EEG data. For example, we added notes to explain what different variables mean.
The folder structure needs to be as follows: AAN_LRclick (main folder) --->data --->--->bids (AAN_LRclick_bids_EEG) --->--->log (AAN_LRclick_behavior_log) --->MNE (AAN_LRclick_EEG_scripts) --->results (AAN_LRclick_results)
To run the MNE-Python scripts: Anaconda was used with MNE-Python 0.22 (see installation at https://mne.tools/stable/index.html# ). For preprocess.py and analysis.py, the complete scripts should be run (from anaconda prompt).
Facebook
TwitterThe main results file are saved separately:
FIGSHARE METADATA
Categories
Keywords
References
GENERAL INFORMATION
Title of Dataset: Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones
Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se
B. Associate or Co-investigator Contact Information Name: Malina Szychowska Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.researchgate.net/profile/Malina_Szychowska Email: malina.szychowska@psychology.su.se
Date of data collection: Subjects (N = 33) were tested between 2019-11-15 and 2020-03-12.
Geographic location of data collection: Department of Psychology, Stockholm, Sweden
Information about funding sources that supported the collection of the data: Swedish Research Council (Vetenskapsrådet) 2015-01181
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: CC BY 4.0
Links to publications that cite or use the data: Szychowska M., & Wiens S. (2020). Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Submitted manuscript.
The study was preregistered: https://doi.org/10.17605/OSF.IO/6FHR8
Links to other publicly accessible locations of the data: N/A
Links/relationships to ancillary data sets: N/A
Was data derived from another source? No
Recommended citation for this dataset: Wiens, S., & Szychowska M. (2020). Open data: Visual load effects on the auditory steady-state responses to 20-, 40-, and 80-Hz amplitude-modulated tones. Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.12582002
DATA & FILE OVERVIEW
File List: The files contain the raw data, scripts, and results of main and supplementary analyses of an electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.
ASSR2_experiment_scripts.zip: contains the Python files to run the experiment.
ASSR2_rawdata.zip: contains raw datafiles for each subject
ASSR2_EEG_scripts.zip: Python-MNE scripts to process the EEG data
ASSR2_EEG_preprocessed_data.zip: EEG data in fif format after preprocessing with Python-MNE scripts
ASSR2_R_scripts.zip: R scripts to analyze the data together with the main datafiles. The main files in the folder are:
ASSR2_results.zip: contains all figures and tables that are created by Python-MNE and R.
METHODOLOGICAL INFORMATION
The EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and saved in .bdf format. For more information, see linked publication.
Methods for processing the data: We conducted frequency analyses and computed event-related potentials. See linked publication
Instrument- or software-specific information needed to interpret the data: MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html# Rstudio used with R (R Core Team, 2020): https://rstudio.com/products/rstudio/ Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v3
Standards and calibration information, if appropriate: For information, see linked publication.
Environmental/experimental conditions: For information, see linked publication.
Describe any quality-assurance procedures per
Facebook
TwitterThe main file is performance_correction.html in AAN3_analysis_scripts.zip. It contains the results of the main analyses.
See AAN3_readme_figshare.txt: 1. Title of Dataset:Open data: Is auditory awareness negativity confounded by performance?
Author Information A. Principal Investigator Contact Information Name: Stefan Wiens Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/swiens-1.184142 Email: sws@psychology.su.se
B. Associate or Co-investigator Contact Information Name: Rasmus Eklund Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/raek2031-1.223133 Email: rasmus.eklund@psychology.su.se
C. Associate or Co-investigator Contact Information Name: Billy Gerdfeldter Institution: Department of Psychology, Stockholm University, Sweden Internet: https://www.su.se/profiles/bige1544-1.403208 Email: billy.gerdfeldter@psychology.su.se
Date of data collection: Subjects (N = 28) were tested between 2018-12-03 and 2019-01-18.
Geographic location of data collection: Department of Psychology, Stockholm, Sweden
Information about funding sources that supported the collection of the data: Swedish Research Council / Vetenskapsrådet (Grant 2015-01181) Marianne and Marcus Wallenberg (Grant 2019-0102)
SHARING/ACCESS INFORMATION
Licenses/restrictions placed on the data: CC BY 4.0
Links to publications that cite or use the data: Eklund R., Gerdfeldter B., & Wiens S. (2020). Is auditory awareness negativity confounded by performance? Consciousness and Cognition. https://doi.org/10.1016/j.concog.2020.102954
The study was preregistered: https://doi.org/10.17605/OSF.IO/W4U7V
Links to other publicly accessible locations of the data: N/A
Links/relationships to ancillary data sets: N/A
Was data derived from another source? No
Recommended citation for this dataset: Eklund R., Gerdfeldter B., & Wiens S. (2020). Open data: Is auditory awareness negativity confounded by performance? Stockholm: Stockholm University. https://doi.org/10.17045/sthlmuni.9724280
DATA & FILE OVERVIEW
File List: The files contain the raw data, scripts, and results of main and supplementary analyses of the electroencephalography (EEG) study. Links to the hardware and software are provided under methodological information.
AAN3_experiment_scripts.zip: contains the Python files to run the experiment
AAN3_rawdata_EEG.zip: contains raw EEG data files for each subject in .bdf format (generated by Biosemi)
AAN3_rawdata_log.zip: contains log files of the EEG session (generated by Python)
AAN3_EEG_scripts.zip: Python-MNE scripts to process and to analyze the EEG data
AAN3_EEG_source_localization_scripts.zip: Python-MNE files needed for source localization. The template MRI is provided in this zip. The files are obtained from the MNE tutorial (https://mne.tools/stable/auto_tutorials/source-modeling/plot_eeg_no_mri.html?highlight=template). Note that the stc folder is empty. The source time course files are not provided because of their large size. They can quickly be generated from the analysis script. They are needed for the source localization.
AAN3_analysis_scripts.zip: R scripts to analyze the data. The main file is performance_correction.html. It contains the results of the main analyses.
AAN3_results.zip: contains summary data files, figures, and tables that are created by Python-MNE and R.
METHODOLOGICAL INFORMATION
Description of methods used for collection/generation of data: The auditory stimuli were two 100-ms tones (f = 900 Hz and 1400 Hz, 5 ms fade-in and fade-out). The experiment was programmed in Python: https://www.python.org/ and used extra functions from here: https://github.com/stamnosslin/mn The EEG data were recorded with an Active Two BioSemi system (BioSemi, Amsterdam, Netherlands; www.biosemi.com) and saved in .bdf format. For more information, see linked publication.
Methods for processing the data: We computed event-related potentials and source localization. See linked publication
Instrument- or software-specific information needed to interpret the data: MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html# Rstudio used with R (R Core Team, 2016): https://rstudio.com/products/rstudio/ Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v3
Standards and calibration information, if appropriate: For information, see linked publication.
Environmental/experimental conditions: For information, see linked publication.
Describe any quality-assurance procedures performed on the data: For information, see linked publication.
People involved with sample collection, processing, analysis and/or submission:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterklib library enables us to quickly visualize missing data, perform data cleaning, visualize data distribution plot, visualize correlation plot and visualize categorical column values. klib is a Python library for importing, cleaning, analyzing and preprocessing data. Explanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor).
Original Github repo
https://raw.githubusercontent.com/akanz1/klib/main/examples/images/header.png" alt="klib Header">
!pip install klib
import klib
import pandas as pd
df = pd.DataFrame(data)
# klib.describe functions for visualizing datasets
- klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features
- klib.corr_mat(df) # returns a color-encoded correlation matrix
- klib.corr_plot(df) # returns a color-encoded heatmap, ideal for correlations
- klib.dist_plot(df) # returns a distribution plot for every numeric feature
- klib.missingval_plot(df) # returns a figure containing information about missing values
Take a look at this starter notebook.
Further examples, as well as applications of the functions can be found here.
Pull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change. Take a look at this Github repo.