11 datasets found

D
Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-thunderbolt-docking-hub-sales-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Thunderbolt Docking Hub Sales Market Outlook

The global market size for Thunderbolt docking hubs was estimated at USD 1.2 billion in 2023 and is expected to reach USD 2.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 9.8% during the forecast period. The market is experiencing significant growth due to the increasing demand for high-speed data transfer and connectivity solutions, coupled with the proliferation of ultra-thin laptops and high-end workstations that require efficient port expansion capabilities.

One major growth factor driving the Thunderbolt docking hub sales market is the increasing adoption of remote work and telecommuting practices. With more professionals working from home, there is a heightened need for reliable and efficient docking solutions that can support multiple peripherals and high-speed data transfer. Thunderbolt technology, known for its superior performance, is becoming the preferred choice for users seeking to enhance their home office setups. Additionally, the rise of digital content creation and consumption, particularly in fields like video editing and graphic design, is spurring demand for high-performance docking hubs that can handle large data files and multiple devices simultaneously.

Another key driver of market growth is the continuous advancements in Thunderbolt technology. The introduction of Thunderbolt 4, which offers enhanced security features, universal compatibility, and improved performance over its predecessors, is expected to fuel market expansion. Thunderbolt 4's ability to support multiple 4K displays, charge laptops, and transfer data at speeds up to 40Gbps makes it an attractive option for both personal and commercial users. This technological evolution ensures that Thunderbolt docking hubs remain relevant in an ever-changing tech landscape, driving sustained market growth.

The increasing trend of ultra-thin laptops and portable devices that often come with limited ports is also contributing to the market's growth. As manufacturers continue to produce sleeker and more compact devices, the need for versatile docking hubs that can provide additional ports and connectivity options becomes more pronounced. Thunderbolt docking hubs, with their ability to offer a plethora of ports including USB, HDMI, Ethernet, and more, are seen as essential accessories for enhancing the functionality of modern laptops and ultrabooks.

From a regional perspective, North America is expected to dominate the Thunderbolt docking hub sales market, driven by the high adoption rate of advanced technologies and the presence of major tech companies. The Asia Pacific region is anticipated to witness the fastest growth, fueled by the increasing penetration of laptops and PCs, rising disposable incomes, and growing awareness about the benefits of Thunderbolt technology. Europe is also a significant market, with steady growth expected due to the strong presence of the IT and telecommunication sectors.

In the context of evolving connectivity needs, USB-C Hubs have emerged as a versatile solution for users seeking to expand the functionality of their devices. These hubs offer a range of ports, including USB, HDMI, and Ethernet, allowing users to connect multiple peripherals with ease. As more devices, such as laptops and tablets, are designed with USB-C ports, the demand for USB-C Hubs is expected to rise. Their compatibility with a wide array of devices makes them an attractive option for both personal and professional use, providing a seamless experience for users who require additional connectivity options. The integration of USB-C Hubs into modern tech ecosystems highlights their importance in enhancing productivity and convenience.

Product Type Analysis

When analyzing the Thunderbolt docking hub market by product type, it is essential to differentiate between portable Thunderbolt docking hubs and desktop Thunderbolt docking hubs. Portable Thunderbolt docking hubs are designed for mobility, appealing to professionals who require connectivity solutions on the go. These hubs are typically compact, lightweight, and often powered through the connected device, making them ideal for travel and remote work scenarios. The demand for portable docking hubs is high among digital nomads, freelancers, and employees who frequently transition between home and office environments. This segment is expected to see substantial growth as the trend towards flexible working arrangements continues to rise.
<br
Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...
zenodo.org
application/gzip
Updated Mar 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana (2021). Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks / Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.3519618
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3519618
Dataset updated
Mar 16, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
João Felipe; João Felipe; Leonardo; Leonardo; Vanessa; Vanessa; Juliana; Juliana
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourages poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub. Based on the results, we proposed and evaluated Julynter, a linting tool for Jupyter Notebooks.

Papers:

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; A large-scale study about quality and reproducibility of jupyter notebooks. In: International Conference on Mining Software Repositories (MSR), 2019, Montreal, Canada.

PIMENTEL, J. F.; MURTA, L.; BRAGANHOLO, V.; FREIRE, J.; Understanding and Improving the Quality and Reproducibility of Jupyter Notebooks. Empirical Software Engineering, 2021 (in press)

This repository contains three files:

db2020-09-22.dump.gz

sample.tar.gz

julynter_reproducility.tar.gz

Reproducing the Notebook Study

The db2020-09-22.dump.gz file contains a PostgreSQL dump of the database, with all the data we extracted from notebooks. For loading it, run:

gunzip -c db2020-09-22.dump.gz | psql jupyter

Note that this file contains only the database with the extracted data. The actual repositories are available in a google drive folder, which also contains the docker images we used in the reproducibility study. The repositories are stored as content/{hash_dir1}/{hash_dir2}.tar.bz2, where hash_dir1 and hash_dir2 are columns of repositories in the database.

For scripts, notebooks, and detailed instructions on how to analyze or reproduce the data collection, please check the instructions on the Jupyter Archaeology repository (tag 1.0.0)

The sample.tar.gz file contains the repositories obtained during the manual sampling.

Reproducing the Julynter Experiment

The julynter_reproducility.tar.gz file contains all the data collected in the Julynter experiment and the analysis notebooks. Reproducing the analysis is straightforward:

Uncompress the file: $ tar zxvf julynter_reproducibility.tar.gz

Install the dependencies: $ pip install julynter/requirements.txt

Run the notebooks in order: J1.Data.Collection.ipynb; J2.Recommendations.ipynb; J3.Usability.ipynb.

The collected data is stored in the julynter/data folder.

Changelog

2019/01/14 - Version 1 - Initial version
2019/01/22 - Version 2 - Update N8.Execution.ipynb to calculate the rate of failure for each reason
2019/03/13 - Version 3 - Update package for camera ready. Add columns to db to detect duplicates, change notebooks to consider them, and add N1.Skip.Notebook.ipynb and N11.Repository.With.Notebook.Restriction.ipynb.
2021/03/15 - Version 4 - Add Julynter experiment; Update database dump to include new data collected for the second paper; remove scripts and analysis notebooks from this package (moved to GitHub), add a link to Google Drive with collected repository files
P
Notebook Inaccessibility Dataset
paperswithcode.com
Updated Aug 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Notebook Inaccessibility Dataset [Dataset]. https://paperswithcode.com/dataset/dataset-release
Explore at:
Dataset updated
Aug 6, 2023
Description
This dataset artifact contains the intermediate datasets from pipeline executions necessary to reproduce the results of the paper. We share this artifact in hopes of providing a starting point for other researchers to extend the analysis on notebooks, discover more about their accessibility, and offer solutions to make data science more accessible. The scripts needed to generate these datasets and analyse them are shared in the Github Repository for this work.

The dataset contains large files of approximately 60 GB so please exercise caution when extracting the data from compressed files.

The dataset contains files which could take a significant amount of run time of the scripts to generate/reproduce.

Dataset Contents We briefly summarize the included files in our dataset. Please refer to the documentation for specific information about the structure of the data in these files, the scripts to generate them, and runtimes for various parts of our data processing pipeline.

epoch_9_loss_0.04706_testAcc_0.96867_X_resnext101_docSeg.pth: We share this model file, originally provided by Jobin et al., to enable the classification of figures found in our dataset. Please place this into the model/ directory. model-results.csv: This file contains results from the classification performed on the figures found in the notebooks in our dataset. Performing this classification may take upto a day.

a11y-scan-dataset.zip: This archive contains two files and results in datasets of approximately 60GB when extracted. Please ensure that you have sufficient disk space to uncompress this zip archive. The archive contains: a11y/a11y-detailed-result.csv: This dataset contains the accessibility scan results from the scans run on the 100k notebooks across themes. > The detailed result file can be really large (> 60 GB) and can be time-consuming to construct. a11y/a11y-aggregate-scan.csv: This file is an aggregate of the detailed result that contains the number of each type of error found in each notebook. > This file is also shared outside the compressed directory.

errors-different-counts-a11y-analyze-errors-summary.csv: This file contains the counts of errors that occur in notebooks across different themes. nb_processed_cell_html.csv: This file contains metadata corresponding to each cell extracted from the html exports of our notebooks. nb_first_interactive_cell.csv: This file contains the necessary metadata to compute the first interactive element, as defined in our paper, in each notebook. nb_processed.csv: This file contains the necessary data after processing the notebooks extracting the number of images, imports, languages, and cell level information. processed_function_calls.csv: This file contains the information about the notebooks, the various imports and function calls used within the notebooks.
20210106 reproducibility and output
kaggle.com
Updated Jan 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roberto Lofaro (2021). 20210106 reproducibility and output [Dataset]. http://doi.org/10.34740/kaggle/dsv/1822349
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/1822349
Dataset updated
Jan 6, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Roberto Lofaro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context

For the Kaggle 2020 Survey contest, produced a notebook

It is also available a GitHub repository under GPL

As the notebook is structured as a report (Executive Summary, references, notes, and of course storyline-through-data), added some functionalities to either visualize within the notebook, or export the output to files.

This could be useful as a template if, as I did often in the past, you have to release both data, the reproducible analysis, as well as a report formatted according to corporate standards, while ensuring consistency across, and easier comparison across version, as well as keeping an historical archive of the evolution.

2021-01-29: released the free edition of the book, 174 pages- can be read online on issuu.com

This is the free version for online-reading only of the book with the same title that will be available on https://leanpub.com/ai-organizational-scalability by end February 2021

An experiment in transitioning to open data and free the approach to report writing done for decades with customers in activities in cultural, organizational, technological change

If you are just interested about the general concepts and approach, jump to Chapter 9 (a 10-pages narrative across the book, with hyperlinks to details)

The free version is here: https://issuu.com/robertolofaro/docs/ai-organizational-scalability-and-kaggle-survey_v1

The published edition uses hyperlinks to allow at least three different reading approaches, beside the usual sequential and serendipity-based: report structure, explanatory, and narrative about the future of Artificial Intelligence within a corporate environment

Content

This dataset contains: * a single large HTML file that is the export from Jupyter Notebook * a ZIP file containing the files generated by the notebook when run with the option to generate output as files.

Each file generated is cross-referenced to the section that produced it, and all the charts etc are: * either SVGs * or, for plotly files, HTML that can be hosted anywhere, as connect to the plotly server and allow to reproduce the chart "as is" from the notebook when the HTML was produced, without a need to access the original data or original dataset.

Along with those "visual" files, there is also a text file that is a log of the execution, as it is build by all the "print" statement within the notebook.

Acknowledgements

Obviously, as I started studying Python in March 2020, I have a huge debt with all that posted online solution to e.g. how to streamline a radar chart or a heatmap, plus countless other minutiae that I absorbed across the months, and are variously used in this notebook.

Inspiration

I worked on software projects since the 1980s, and building data-based models and presentations since then, interfacing generally with business users and (senior) managers since I was in my early 20s.

Hence, I am used to document- and wanted to use this opportunity to have a try at working on a deadline to produce something as I produced in the past in various forms, but using just a single Jupyter Notebook, Python, and a single data file (the one provided by Kaggle), in the shortest time possibile, to see if it was feasible.

There is plenty of room for improvement, but I look forward to learning more thanks to all the notebooks shared here, and contribute when (as now) I think that my past non-Python experience could be useful to bridge between data and business.

Hence, all my datasets and notebooks are generally CC BY, SA only when I want to avoid data or content risking being distorted.
D
Laptop Solid State Drives Ssd Market Report | Global Forecast From 2025 To...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Laptop Solid State Drives Ssd Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/laptop-solid-state-drives-ssd-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Jan 7, 2025
Authors
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Laptop Solid State Drives (SSD) Market Outlook

The global laptop solid state drives (SSD) market size was valued at USD 25 billion in 2023 and is projected to reach USD 75 billion by 2032, exhibiting a compound annual growth rate (CAGR) of 12.5% during the forecast period. The significant growth factor driving this market is the increasing demand for faster, reliable, and energy-efficient storage solutions in laptops, propelled by advancements in technology and the need for enhanced performance in personal computing and enterprise applications.

One of the key growth factors for the laptop SSD market is the ongoing transition from traditional Hard Disk Drives (HDDs) to SSDs. SSDs offer numerous advantages over HDDs, such as faster data access speeds, lower power consumption, and greater durability, making them the preferred choice for modern laptops. As consumers and enterprises alike seek to improve computing performance and efficiency, the adoption of SSDs is expected to rise considerably. Moreover, the decreasing cost of SSDs over the years has made them more accessible, further boosting their market penetration.

Innovation in SSD technology is another major driver of market growth. Manufacturers are continuously developing new SSD interfaces and form factors, such as NVMe and PCIe, which offer superior performance compared to traditional SATA SSDs. These advancements cater to the growing demand for high-speed data storage solutions in various applications, including gaming, professional content creation, and data-intensive enterprise operations. Additionally, the introduction of 3D NAND technology has significantly increased the storage capacity and lifespan of SSDs, making them more attractive to consumers and businesses.

The increasing trend of digitalization and the rise in data-centric applications also contribute to the growth of the laptop SSD market. With the proliferation of high-definition video content, cloud computing, big data analytics, and artificial intelligence, there is a growing need for efficient and high-capacity storage solutions. SSDs, with their superior performance characteristics, are well-suited to meet these demands. Furthermore, the growing adoption of SSDs in ultrabooks and portable laptops, driven by consumer preferences for lightweight and high-performance devices, is expected to fuel market growth.

As the demand for portable and efficient storage solutions grows, the Enterprise Portable SSD has emerged as a vital component in the storage industry. These SSDs offer the flexibility and speed required by businesses that need to manage large volumes of data on the go. With their compact design and robust performance, Enterprise Portable SSDs are becoming increasingly popular among professionals who require reliable storage solutions that can be easily transported between different locations. This trend is particularly evident in sectors such as media production, where large files need to be moved quickly and securely between various devices and locations. The rise of remote work and the need for mobile data access further underscore the importance of Enterprise Portable SSDs in modern business operations.

Regionally, North America and Asia Pacific are the leading markets for laptop SSDs. North America, being home to major technology companies and having a high adoption rate of advanced technologies, accounts for a significant share of the market. Meanwhile, the Asia Pacific region is witnessing rapid growth due to the increasing demand for laptops in emerging economies, driven by factors such as rising disposable incomes, expanding IT infrastructure, and the growing trend of remote working and e-learning.

Type Analysis

The laptop SSD market is segmented by type into SATA SSD, NVMe SSD, PCIe SSD, and Others. SATA SSDs have been the traditional choice for storage due to their compatibility and affordability. They offer a significant improvement in speed and reliability over HDDs, making them a popular option for budget-conscious consumers and enterprises looking to upgrade their storage solutions. However, the performance limitations of SATA SSDs compared to newer interfaces have led to a shift towards more advanced SSD types.

NVMe SSDs represent a significant leap in performance over SATA SSDs. By leveraging the non-volatile memory express (NVMe) protocol, these SSDs offer much higher data transfer speeds and lower latency, making them ideal for high-performance applicatio
Jupyter Notebook Activity Dataset (rsds-20241113)
zenodo.org
application/gzip, zip
Updated Jan 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomoki Nakamaru; Tomoki Nakamaru; Tomomasa Matsunaga; Tetsuro Yamazaki; Tomomasa Matsunaga; Tetsuro Yamazaki (2025). Jupyter Notebook Activity Dataset (rsds-20241113) [Dataset]. http://doi.org/10.5281/zenodo.13357570
Explore at:
zip, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13357570
Dataset updated
Jan 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tomoki Nakamaru; Tomoki Nakamaru; Tomomasa Matsunaga; Tetsuro Yamazaki; Tomomasa Matsunaga; Tetsuro Yamazaki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
List of data

rsds-20241113.zip: Collection of SQLite database files

image.tar.gz: Docker image provided in our data collection experiment

redspot-341ffa5.zip: Redspot source code (redspot@341ffa5)

Extended version of Section 2D of our paper
Redspot is a Jupyter extension (i.e., Python package) that records activity signals. However, it also offers interfaces to read recorded signals. The following shows the most basic usage of its command-line interface:

redspot replay

This command generates snapshots (.ipynb files) restored from the signal records. Note that this command does not produce a snapshot for every signal. Since the change represented by a single signal is typically minimal (e.g., one keystroke), generating a snapshot for each signal results in a meaninglessly large number of snapshots. However, we want to obtain signal-level snapshots for some analyses. In such cases, one can analyze them using the application programming interfaces:

from redspot import database

from redspot.notebook import Notebook

nbk = Notebook()

for signal in database.get("path-to-db"):

time, panel, kind, args = signal

nbk.apply(kind, args) # apply change

print(nbk) # print notebook

To record activities, one needs to run the Redspot command in the recording mode as follows:

redspot record

This command launches Jupyter Notebook with Redspot enabled. Activities made in the launched environment are stored in an SQLite file named ``redspot.db'' under the current path.

To launch the environment we provided to the participants, one first needs to download and import the image (image.tar.gz). One can then run the image with the following command:

docker run --rm -it -p8888:8888

Note that the SQLite file is generated in the running container. The file can be downloaded into the host machine via the file viewer of Jupyter Notebook.
Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...
zenodo.org
bz2
Updated Mar 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2021). Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter Notebooks [Dataset]. http://doi.org/10.5281/zenodo.2538877
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.2538877
Dataset updated
Mar 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.

This repository contains two files:

dump.tar.bz2

jupyter_reproducibility.tar.bz2

The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.

The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:

analyses: this folder has all the notebooks we use to analyze the data in the PostgreSQL database.

archaeology: this folder has all the scripts we use to query, download, and extract data from GitHub notebooks.

paper: empty. The notebook analyses/N11.To.Paper.ipynb moves data to it

In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.

Reproducing the Analysis

This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:

Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.1
Python 3.6.8
PdfCrop 2012/11/02 v1.38

First, download dump.tar.bz2 and extract it:

tar -xjf dump.tar.bz2

It extracts the file db2019-01-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:

psql jupyter < db2019-01-13.dump

It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:

export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Create a conda environment with Python 3.6:

conda create -n py36 python=3.6

Go to the analyses folder and install all the dependencies of the requirements.txt

cd jupyter_reproducibility/analyses pip install -r requirements.txt

For reproducing the analyses, run jupyter on this folder:

jupyter notebook

Execute the notebooks on this order:

N0.Index.ipynb

N1.Repository.ipynb

N2.Notebook.ipynb

N3.Cell.ipynb

N4.Features.ipynb

N5.Modules.ipynb

N6.AST.ipynb

N7.Name.ipynb

N8.Execution.ipynb

N9.Cell.Execution.Order.ipynb

N10.Markdown.ipynb

N11.To.Paper.ipynb

Reproducing or Expanding the Collection

The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.

Requirements

This time, we have extra requirements:

All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account

Environment

First, set the following environment variables:

export JUP_MACHINE="db"; # machine identifier export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories export JUP_LOGS_DIR="/home/jupyter/logs"; # log files export JUP_COMPRESSION="lbzip2"; # compression program export JUP_VERBOSE="5"; # verbose level export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection export JUP_GITHUB_USERNAME="github_username"; # your github username export JUP_GITHUB_PASSWORD="github_password"; # your github password export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB) export JUP_FIRST_DATE="2013-01-01"; # initial date to query github export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address export JUP_EMAIL_TO="target@email.com"; # email that receives notifications export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank export JUP_WITH_EXECUTION="1"; # run execute python notebooks export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies export JUP_EXECUTION_MODE="-1"; # run following the execution order export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction # Frequenci of log report export JUP_ASTROID_FREQUENCY="5"; export JUP_IPYTHON_FREQUENCY="5"; export JUP_NOTEBOOKS_FREQUENCY="5"; export JUP_REQUIREMENT_FREQUENCY="5"; export JUP_CRAWLER_FREQUENCY="1"; export JUP_CLONE_FREQUENCY="1"; export JUP_COMPRESS_FREQUENCY="5"; export JUP_DB_IP="localhost"; # postgres database IP

Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf

Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.

Scripts

Download and extract jupyter_reproducibility.tar.bz2:

tar -xjf jupyter_reproducibility.tar.bz2

Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):

Conda 2.7

conda create -n raw27 python=2.7 -y conda activate raw27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 2.7

conda create -n py27 python=2.7 anaconda -y conda activate py27 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.4

It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.

conda create -n raw34 python=3.4 -y conda activate raw34 conda install jupyter -c conda-forge -y conda uninstall jupyter -y pip install --upgrade pip pip install jupyter pip install pipenv pip install -e jupyter_reproducibility/archaeology pip install pathlib2

Anaconda 3.4

conda create -n py34 python=3.4 anaconda -y conda activate py34 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.5

conda create -n raw35 python=3.5 -y conda activate raw35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.5

It requires the manual installation of other anaconda packages.

conda create -n py35 python=3.5 anaconda -y conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator conda activate py35 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.6

conda create -n raw36 python=3.6 -y conda activate raw36 pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Anaconda 3.6

conda create -n py36 python=3.6 anaconda -y conda activate py36 conda install -y anaconda-navigator jupyterlab_server navigator-updater pip install --upgrade pip pip install pipenv pip install -e jupyter_reproducibility/archaeology

Conda 3.7

conda create -n raw37 python=3.7 -y conda activate raw37 pip install --upgrade pip pip install pipenv pip install -e
H
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
hydroshare.org
search.dataone.org
zip
Updated Oct 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
zip(2.4 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 15, 2024
Dataset provided by
HydroShare
Authors
Young-Don Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Data from: DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks
zenodo.org
application/gzip, bin +1
Updated Jan 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mojtaba Mostafavi Ghahfarokhi; Mojtaba Mostafavi Ghahfarokhi; Arash Asgari; Arash Asgari; Mohammad Abolnejadian; Mohammad Abolnejadian; Abbas Heydarnoori; Abbas Heydarnoori (2024). DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks [Dataset]. http://doi.org/10.5281/zenodo.10317389
Explore at:
bin, csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10317389
Dataset updated
Jan 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mojtaba Mostafavi Ghahfarokhi; Mojtaba Mostafavi Ghahfarokhi; Arash Asgari; Arash Asgari; Mohammad Abolnejadian; Mohammad Abolnejadian; Abbas Heydarnoori; Abbas Heydarnoori
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

DistilKaggle is a curated dataset extracted from Kaggle Jupyter notebooks spanning from September 2015 to October 2023. This dataset is a distilled version derived from the download of over 300GB of Kaggle kernels, focusing on essential data for research purposes. The dataset exclusively comprises publicly available Python Jupyter notebooks from Kaggle. The essential information for retrieving the data needed to download the dataset is obtained from the MetaKaggle dataset provided by Kaggle.

Contents

The DistilKaggle dataset consists of three main CSV files:

code.csv: Contains over 12 million rows of code cells extracted from the Kaggle kernels. Each row is identified by the kernel's ID and cell index for reproducibility.

markdown.csv: Includes over 5 million rows of markdown cells extracted from Kaggle kernels. Similar to code.csv, each row is identified by the kernel's ID and cell index.

notebook_metrics.csv: This file provides notebook features described in the accompanying paper released with this dataset. It includes metrics for over 517,000 Python notebooks.

Directory Structure

The kernels directory is organized based on Kaggle's Performance Tiers (PTs), a ranking system in Kaggle that classifies users. The structure includes PT-specific directories, each containing user ids that belong to this PT, download logs, and the essential data needed for downloading the notebooks.

The utility directory contains two important files:

aggregate_data.py: A Python script for aggregating data from different PTs into the mentioned CSV files.

application.ipynb: A Jupyter notebook serving as a simple example application using the metrics dataframe. It demonstrates predicting the PT of the author based on notebook metrics.

DistilKaggle.tar.gz: It is just the compressed version of the whole dataset. If you downloaded all of the other files independently already, there is no need to download this file.

Usage

Researchers can leverage this distilled dataset for various analyses without dealing with the bulk of the original 300GB dataset. For access to the raw, unprocessed Kaggle kernels, researchers can request the dataset directly.

Note

The original dataset of Kaggle kernels is substantial, exceeding 300GB, making it impractical for direct upload to Zenodo. Researchers interested in the full dataset can contact the dataset maintainers for access.

Citation

If you use this dataset in your research, please cite the accompanying paper or provide appropriate acknowledgment as outlined in the documentation.

If you have any questions regarding the dataset, don't hesitate to contact me at mohammad.abolnejadian@gmail.com

Thank you for using DistilKaggle!
d
Development of an AI/ML-ready knee ultrasound dataset in a population-based...
dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nelson, Amanda (2023). Development of an AI/ML-ready knee ultrasound dataset in a population-based cohort [Dataset]. http://doi.org/10.7910/DVN/SKP9IB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SKP9IB
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Nelson, Amanda
Description
About this data An ultrasound dataset to use in the discovery of ultrasound features associated with pain and radiographic change in KOA is highly innovative and will be a major step forward for the field. These ultrasound images originate from the diverse and inclusive population-based Johnston County Health Study (JoCoHS). This dataset is designed to adhere to FAIR principles and was funded in part by an Administrative Supplement to Improve the AI/ML-Readiness of NIH-Supported Data (3R01AR077060-03S1). Working with this dataset WorkingWithTheDataset.ipynb Jupyter notebook If you are familiar with working with Jupyter notebooks, we recommend using the WorkingWithTheDataset.ipynb Jupyter notebook to retrieve, validate, and learn more about the dataset. You should downloading the latest WorkingWithTheDataset.ipynb file and uploading it to an online Jupyter environment such as https://colab.research.google.com or use the notebook in your Jupyter environment of choice. You will also need to download the CONFIGURATION_SETTINGS.template.md file from this dataset since the contents are used to configure the Jupyter notebook. Note: at the time of this writing, we do not recommend using Binder (mybinder.org) if you are interested in only reviewing the WorkingWithTheDataset.ipynb notebook. When Binder loads the dataset, it will download all files from this dataset, resulting in a long build time. However, if you plan to work with all files in the dataset then Binder might work for you. We do not offer support for this service or other Jupyter Lab environments. Metadata The DatasetMetadata.json file contains general information about the files and variables within this dataset. We use it as our validation metadata to verify the data we are importing into this Dataverse dataset. This file is also the most comprehensive with regards to the dataset metadata. Data collection in progress This dataset is not complete and will be updated regularly as additional data is collected.
Belvedere Glacier long-term monitoring Open Data
zenodo.org
data.niaid.nih.gov
json, text/x-python +1
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francesco Ioli; Francesco Ioli; Carlo De Gaetani; Carlo De Gaetani; Federico Barbieri; Federica Gaspari; Federica Gaspari; Livio Pinto; Livio Pinto; Lorenzo Rossi; Lorenzo Rossi; Federico Barbieri (2024). Belvedere Glacier long-term monitoring Open Data [Dataset]. http://doi.org/10.5281/zenodo.10817029
Explore at:
zip, json, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10817029
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Francesco Ioli; Francesco Ioli; Carlo De Gaetani; Carlo De Gaetani; Federico Barbieri; Federica Gaspari; Federica Gaspari; Livio Pinto; Livio Pinto; Lorenzo Rossi; Lorenzo Rossi; Federico Barbieri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Feb 29, 2024
Area covered
Belvedere Glacier
Description
Introduction

This dataset contains extensive, long-term monitoring data on the Belvedere Glacier, a debris-covered glacier located on the east face of Monte Rosa in the Anzasca Valley of the Italian Alps. The data is derived from photogrammetric 3D reconstruction of the full Belvedere Glacier and includes:

dense point clouds obtained with UAV-based MVS covering the entire glacier body

high-resolution orthophotos

high-resolution DEMs

Since 2015, in-situ survey of the glacier have been conducted annually using fixed-wing UAVs until 2020 and quadcopters from 2021 to 2022 to remotely sense the glacier and build high-resolution photogrammetric models. A set of ground control points (GCPs) were materialized all over the glacier area, both inside the glacier and along the moraines, and surveyed (nearly-) yearly with topographic-grade GNSS receivers (Ioli et al., 2022).

For the period from 1977 to 2001, historical analog images, digitalized with photogrammetric scanners and acquired from aerial platforms, were used in combination with GCPs obtained from recent photogrammetric models (De Gaetani et al., 2021).

Before downloading them, you can explore the photogrammetric point clouds of the Belvedere Glacier within web app based on Potree from https://thebelvedereglacier.it/ (use a web browser from a desktop/laptop for the best experience). Additionally, from here you can also visualize and download the coordinates of the GCPs measured by GNSS every year since 2015.

Belvedere Glacier

The Belvedere Glacier is an important temperate alpine glacier located on the east face of Monte Rosa in the Anzasca Valley of Italy. The Belvedere Glacier is of particular importance among alpine glaciers because it is a debris-covered glacier and it reaches its lowest elevation at about 1800 m a.s.l. Over the last century, the Belvedere Glacier has experienced extraordinary dynamics, such as a surge-like movement or the formation of a supraglacial lake, which seriously threatened the nearby community of Macugnaga.

Data organization

The data are organized by year in compressed zip folders named belvedere_YYYY.zip, which can be downloaded independently. Each folder contains all data available for that year (i.e. photogrammetric point clouds, orthophotos, and DEMs) and the corresponding metadata. Metadata is provided as a .json file which contains all the main information for data usage. Point clouds are saved in compressed las format (.laz) and they can be inspected e.g., with CloudCompare. Orthophotos and DEMs are georeferenced images (.tif) that can be inspected with any GIS software (e.g., QGIS).

Large point clouds are subdivided into regular tiles, which are numbered in a progressive row-wise order from the bottom-left corner of the point cloud bounding box.

All the files are named according to the following naming schema:

"belv_YYYY_surveyplatform_datatype[_resolution][vertical_datum][-tile_number].extension"

where:

YYYY: is the year of the survey

surveyplatform: can be either "uav" for the UAV-based photogrammetry survey or "histo" for the historical aerial datasets.

datatype: can be either "pcd" for point clouds, "orthophoto" for orthophotos and "dsm" for DSMs.

resolution: on-ground resolution of each pixel in meters. This applies only to raster data (orthophoto and DSMs)

vertical_datum: if the DSM is given in orthometric coordinates, the label "ortho" is present in the filename, otherwise the height of the dataset is supposed to be ellipsoidal.

tile: tile number, if the data is tiled to avoid large files.

Data Usage

This dataset can be used to estimate glacier velocities, volume variations, study geomorphological processes such as the process of moraine collapse, or derive other information on glacier dynamics. If you have any requests on the data provided, data acquisition, or the raw data themselves, you are encouraged to contact us.

Contributions

The monitoring activity carried out on the Belvedere Glacier was designed and conducted jointly by the Department of Civil and Environmental Engineering (DICA) of Politecnico di Milano and the Department of Environment, Land and Infrastructure Engineering (DIATI) of Politecnico di Torino. The DREAM projects (DRone tEchnnology for wAter resources and hydrologic hazard Monitoring), involving teachers and students from Alta Scuola Politecnica (ASP) of Politecnico di Torino and Milano, contributed to the campaign from 2015 to 2017.

Acknowledgements

The authors thank CGR SpA for digitizing the historical images (1977, 1991, 2001, 2009) and making them available to the authors for the photogrammetric processing.

The authors thank all students and collaborators contributing to the Alta Scuola Politecnica projects DREAM 1, DREAM 2, and DREAM 3 (DRone tEchnnology for wAter resources and hydrologic hazard Monitoring).

If you use the data, please, cite these our pubblications:

Ioli, F., Dematteis, N., Giordan, D., Nex, F., Pinto, L., Deep Learning Low-cost Photogrammetry for 4D Short-term Glacier Dynamics Monitoring. PFG (2024). https://doi.org/10.1007/s41064-023-00272-w

Ioli, F.; Bianchi, A.; Cina, A.; De Michele, C.; Maschio, P.; Passoni, D.; Pinto, L. Mid-Term Monitoring of Glacier’s Variations with UAVs: The Example of the Belvedere Glacier. Remote Sensing, 14, 28 (2022). https://doi.org/10.3390/rs14010028

De Gaetani, C.I.; Ioli, F.; Pinto, L. Aerial and UAV Images for Photogrammetric Analysis of Belvedere Glacier Evolution in the Period 1977–2019. Remote Sensing, 13, 3787 (2021). https://doi.org/10.3390/rs13183787
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dataintelo (2025). Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-thunderbolt-docking-hub-sales-market

Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To 2033

Explore at:

csv, pptx, pdfAvailable download formats

Dataset updated

Jan 7, 2025

Authors

Dataintelo

License

https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

Time period covered

2024 - 2032

Area covered

Global

Description

Thunderbolt Docking Hub Sales Market Outlook

The global market size for Thunderbolt docking hubs was estimated at USD 1.2 billion in 2023 and is expected to reach USD 2.8 billion by 2032, growing at a compound annual growth rate (CAGR) of 9.8% during the forecast period. The market is experiencing significant growth due to the increasing demand for high-speed data transfer and connectivity solutions, coupled with the proliferation of ultra-thin laptops and high-end workstations that require efficient port expansion capabilities.

One major growth factor driving the Thunderbolt docking hub sales market is the increasing adoption of remote work and telecommuting practices. With more professionals working from home, there is a heightened need for reliable and efficient docking solutions that can support multiple peripherals and high-speed data transfer. Thunderbolt technology, known for its superior performance, is becoming the preferred choice for users seeking to enhance their home office setups. Additionally, the rise of digital content creation and consumption, particularly in fields like video editing and graphic design, is spurring demand for high-performance docking hubs that can handle large data files and multiple devices simultaneously.

Another key driver of market growth is the continuous advancements in Thunderbolt technology. The introduction of Thunderbolt 4, which offers enhanced security features, universal compatibility, and improved performance over its predecessors, is expected to fuel market expansion. Thunderbolt 4's ability to support multiple 4K displays, charge laptops, and transfer data at speeds up to 40Gbps makes it an attractive option for both personal and commercial users. This technological evolution ensures that Thunderbolt docking hubs remain relevant in an ever-changing tech landscape, driving sustained market growth.

The increasing trend of ultra-thin laptops and portable devices that often come with limited ports is also contributing to the market's growth. As manufacturers continue to produce sleeker and more compact devices, the need for versatile docking hubs that can provide additional ports and connectivity options becomes more pronounced. Thunderbolt docking hubs, with their ability to offer a plethora of ports including USB, HDMI, Ethernet, and more, are seen as essential accessories for enhancing the functionality of modern laptops and ultrabooks.

From a regional perspective, North America is expected to dominate the Thunderbolt docking hub sales market, driven by the high adoption rate of advanced technologies and the presence of major tech companies. The Asia Pacific region is anticipated to witness the fastest growth, fueled by the increasing penetration of laptops and PCs, rising disposable incomes, and growing awareness about the benefits of Thunderbolt technology. Europe is also a significant market, with steady growth expected due to the strong presence of the IT and telecommunication sectors.

In the context of evolving connectivity needs, USB-C Hubs have emerged as a versatile solution for users seeking to expand the functionality of their devices. These hubs offer a range of ports, including USB, HDMI, and Ethernet, allowing users to connect multiple peripherals with ease. As more devices, such as laptops and tablets, are designed with USB-C ports, the demand for USB-C Hubs is expected to rise. Their compatibility with a wide array of devices makes them an attractive option for both personal and professional use, providing a seamless experience for users who require additional connectivity options. The integration of USB-C Hubs into modern tech ecosystems highlights their importance in enhancing productivity and convenience.

Product Type Analysis

When analyzing the Thunderbolt docking hub market by product type, it is essential to differentiate between portable Thunderbolt docking hubs and desktop Thunderbolt docking hubs. Portable Thunderbolt docking hubs are designed for mobility, appealing to professionals who require connectivity solutions on the go. These hubs are typically compact, lightweight, and often powered through the connected device, making them ideal for travel and remote work scenarios. The demand for portable docking hubs is high among digital nomads, freelancers, and employees who frequently transition between home and office environments. This segment is expected to see substantial growth as the trend towards flexible working arrangements continues to rise.

<br

Clear search

Close search

Google apps

Main menu

Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To...

Thunderbolt Docking Hub Sales Market Outlook

Product Type Analysis

Dataset of A Large-scale Study about Quality and Reproducibility of Jupyter...

Notebook Inaccessibility Dataset

20210106 reproducibility and output

Context

Content

Acknowledgements

Inspiration

Laptop Solid State Drives Ssd Market Report | Global Forecast From 2025 To...

Laptop Solid State Drives (SSD) Market Outlook

Type Analysis

Jupyter Notebook Activity Dataset (rsds-20241113)

List of data

Extended version of Section 2D of our paper

Reproducibility in Practice: Dataset of a Large-Scale Study of Jupyter...

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

Data from: DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks

Overview

Contents

Directory Structure

Usage

Note

Citation

Development of an AI/ML-ready knee ultrasound dataset in a population-based...

Belvedere Glacier long-term monitoring Open Data

Thunderbolt Docking Hub Sales Market Report | Global Forecast From 2025 To 2033

Thunderbolt Docking Hub Sales Market Outlook

Product Type Analysis