5 datasets found

m
COVID-19 Scholarly Production Dataset
data.mendeley.com
Updated Jul 7, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gisliany Alves (2020). COVID-19 Scholarly Production Dataset [Dataset]. http://doi.org/10.17632/kx7wwc8dzp.5
Explore at:
Unique identifier
https://doi.org/10.17632/kx7wwc8dzp.5
Dataset updated
Jul 7, 2020
Authors
Gisliany Alves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID-2019 has been recognized as a global threat, and several studies are being conducted in order to contribute to the fight and prevention of this pandemic. This work presents a scholarly production dataset focused on COVID-19, providing an overview of scientific research activities, making it possible to identify countries, scientists and research groups most active in this task force to combat the coronavirus disease. The dataset is composed of 40,212 records of articles' metadata collected from Scopus, PubMed, arXiv and bioRxiv databases from January 2019 to July 2020. Those data were extracted by using the techniques of Python Web Scraping and preprocessed with Pandas Data Wrangling.

PythonLibraries|WheelFiles

kaggle.com

Updated Mar 25, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ravi Ramakrishnan (2024). PythonLibraries|WheelFiles [Dataset]. https://www.kaggle.com/datasets/ravi20076/pythonlibrarieswheelfiles/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 25, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Ravi Ramakrishnan

License

https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

Description

Hello all,
This dataset is my humble attempt to allow myself and others to upgrade essential python packages to their latest versions. This dataset contains the .whl files of the below packages to be used across general kernels and especially in internet-off code challenges-

Package	Version	Functionality
AutoGluon	1.0.0	AutoML models
Catboost	1.2.2 1.2.3	ML models
Iterative-Stratification	0.1.7	Iterative stratification for multi-label classifiers
Joblib	1.3.2	File dumping and retrieval
LAMA	0.3.8b1	AutoML models
LightGBM	4.3.0 4.2.0 4.1.0	ML models
MAPIE	0.8.2	Quantile regression
Numpy	1.26.3	Data wrangling
Pandas	2.1.4	Data wrangling
Polars	0.20.3 0.20.4	Data wrangling
PyTorch	2.0.1	Neural networks
PyTorch-TabNet	4.1.0	Neural networks
PyTorch-Forecast	0.7.0	Neural networks
Pygwalker	0.3.20	Data wrangling and visualization
Scikit-learn	1.3.2 1.4.0	ML Models/ Pipelines/ Data wrangling
Scipy	1.11.4	Data wrangling/ Statistics
TabPFN	10.1.9	ML models
Torch-Frame	1.7.5	Neural Networks
TorchVision	0.15.2	Neural Networks
XGBoost	2.0.2 2.0.1 2.0.3	ML models

I plan to update this dataset with more libraries and later versions as they get upgraded in due course. I hope these wheel files are useful to one and all.

Recent updates based on user feedback-

lightgbm 4.1.0 and 4.3.0
Older XGBoost versions (2.0.1 and 2.0.2)
Torch-Frame, TabNet, PyTorch-Forecasting, TorchVision
MAPIE
LAMA 0.3.8b1
Iterative-Stratification
Catboost 1.2.3

Best regards and happy learning and coding!

H
Data from: SBIR - STTR Data and Code for Collecting Wrangling and Using It
dataverse.harvard.edu
Updated Nov 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grant Allard (2018). SBIR - STTR Data and Code for Collecting Wrangling and Using It [Dataset]. http://doi.org/10.7910/DVN/CKTAZX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/CKTAZX
Dataset updated
Nov 5, 2018
Dataset provided by
Harvard Dataverse
Authors
Grant Allard
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data set consisting of data joined for analyzing the SBIR/STTR program. Data consists of individual awards and agency-level observations. The R and python code required for pulling, cleaning, and creating useful data sets has been included. Allard_Get and Clean Data.R This file provides the code for getting, cleaning, and joining the numerous data sets that this project combined. This code is written in the R language and can be used in any R environment running R 3.5.1 or higher. If the other files in this Dataverse are downloaded to the working directory, then this Rcode will be able to replicate the original study without needing the user to update any file paths. Allard SBIR STTR WebScraper.py This is the code I deployed to multiple Amazon EC2 instances to scrape data o each individual award in my data set, including the contact info and DUNS data. Allard_Analysis_APPAM SBIR project Forthcoming Allard_Spatial Analysis Forthcoming Awards_SBIR_df.Rdata This unique data set consists of 89,330 observations spanning the years 1983 - 2018 and accounting for all eleven SBIR/STTR agencies. This data set consists of data collected from the Small Business Administration's Awards API and also unique data collected through web scraping by the author. Budget_SBIR_df.Rdata 246 observations for 20 agencies across 25 years of their budget-performance in the SBIR/STTR program. Data was collected from the Small Business Administration using the Annual Reports Dashboard, the Awards API, and an author-designed web crawler of the websites of awards. Solicit_SBIR-df.Rdata This data consists of observations of solicitations published by agencies for the SBIR program. This data was collected from the SBA Solicitations API. Primary Sources Small Business Administration. “Annual Reports Dashboard,” 2018. https://www.sbir.gov/awards/annual-reports. Small Business Administration. “SBIR Awards Data,” 2018. https://www.sbir.gov/api. Small Business Administration. “SBIR Solicit Data,” 2018. https://www.sbir.gov/api.
h
ds-coder-instruct-v1
huggingface.co
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edvard Avagyan (2024). ds-coder-instruct-v1 [Dataset]. https://huggingface.co/datasets/ed001/ds-coder-instruct-v1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 10, 2024
Authors
Edvard Avagyan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for DS Coder Instruct Dataset

DS Coder is a dataset for instruction fine tuning of language models. It is a specialized dataset focusing only on data science (eg. plotting, data wrangling, machine learnig models, deep learning, and numerical computations). The dataset contains code examples both in R and Python. The goal of this dataset is to enable creation of small-scale, specialized language model assistants for data science projects.

Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/ed001/ds-coder-instruct-v1.
m
Bee Swarm Analysis
data.mendeley.com
Updated Jul 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kosta Manser (2022). Bee Swarm Analysis [Dataset]. http://doi.org/10.17632/5bmscj7jf7.1
Explore at:
Unique identifier
https://doi.org/10.17632/5bmscj7jf7.1
Dataset updated
Jul 4, 2022
Authors
Kosta Manser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data collected by E. Hunting et al. comprising video footage and electric field recordings from a video camera and field mill respectively. Data wrangling was done by K. Manser, the author of the python script.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Gisliany Alves (2020). COVID-19 Scholarly Production Dataset [Dataset]. http://doi.org/10.17632/kx7wwc8dzp.5

COVID-19 Scholarly Production Dataset

Explore at:

24 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.17632/kx7wwc8dzp.5

Dataset updated

Jul 7, 2020

Authors

Gisliany Alves

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

COVID-2019 has been recognized as a global threat, and several studies are being conducted in order to contribute to the fight and prevention of this pandemic. This work presents a scholarly production dataset focused on COVID-19, providing an overview of scientific research activities, making it possible to identify countries, scientists and research groups most active in this task force to combat the coronavirus disease. The dataset is composed of 40,212 records of articles' metadata collected from Scopus, PubMed, arXiv and bioRxiv databases from January 2019 to July 2020. Those data were extracted by using the techniques of Python Web Scraping and preprocessed with Pandas Data Wrangling.

Clear search

Close search

Google apps

Main menu

COVID-19 Scholarly Production Dataset

PythonLibraries|WheelFiles

Recent updates based on user feedback-

Data from: SBIR - STTR Data and Code for Collecting Wrangling and Using It

ds-coder-instruct-v1

Bee Swarm Analysis

COVID-19 Scholarly Production DatasetSee More Versions

COVID-19 Scholarly Production Dataset