Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Facebook
TwitterThis dataset was created by DINESH JATAV
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Import Users From Csv With Meta technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The CsvReader is a component designed to read and process CSV (Comma-Separated Values) files, which are widely used for storing tabular data. This component can be used to load CSV files, perform operations like filtering and aggregation, and then output the results. It is a valuable tool for data preprocessing in various workflows, including data analysis and machine learning pipelines.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials
Background
This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.
The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).
Usage
Included Files
File Format: Downsampled Data
These are the "LP_
These data files can be easily loaded using the pandas library in Python through:
import pandas
data = pandas.read_csv(data_file, index_col=0)
The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.
File Format: Unreduced Data
These are the "LP_
The data can be loaded and used similarly to the downsampled data.
File Format: Overall_Summary
The overall summary file provides data on all the test specimens in the database. The columns include:
File Format: Summarized_Mechanical_Props_Campaign
Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,
tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
keep_default_na=False, na_values='')
Caveats
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Demo
Dataset Summary
This is a demo dataset with two files train.csv and test.csv. Load it by: from datasets import load_dataset data_files = {"train": "train.csv", "test": "test.csv"} demo = load_dataset("stevhliu/demo", data_files=data_files)
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
[More Information… See the full description on the dataset page: https://huggingface.co/datasets/Axion004/my-awesome-dataset.
Facebook
TwitterTo make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.
You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:
df<-read.csv('uber.csv')
df_black<-subset(uber_df, uber_df$name == 'Black')
write.csv(df_black, "nameofthefileyouwanttosaveas.csv")
getwd()
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Facebook
Twitter[doc] formats - csv - 1
This dataset contains one csv file at the root:
data.csv
kind,sound dog,woof cat,meow pokemon,pika human,hello
size_categories:
Facebook
TwitterThis data release supports interpretations of field-observed root distributions within a shallow landslide headscarp (CB1) located below Mettman Ridge within the Oregon Coast Range, approximately 15 km northeast of Coos Bay, Oregon, USA. (Schmidt_2021_CB1_topo_far.png and Schmidt_2021_CB1_topo_close.png). Root species, diameter (greater than or equal to 1 mm), general orientation relative to the slide scarp, and depth below ground surface were characterized immediately following landsliding in response to large-magnitude precipitation in November 1996 which triggered thousands of landslides within the area (Montgomery and others, 2009). The enclosed data includes: (1) tests of root-thread failure as a function of root diameter and tensile load for different plant species applicable to the broader Oregon Coast Range and (2) tape and compass survey of the planform geometry of the CB1 landslide and the roots observed in the slide scarp. Root diameter and load measurements were principally collected in the general area of the CB1 slide for 12 species listed in: Schmidt_2021_OR_root_species_list.csv. Methodology of the failure tests included identifying roots of a given plant species, trimming root threads into 15-20 cm long segments, measuring diameters including bark (up to 6.5 mm) with a micrometer at multiple points along the segment to arrive at an average, clamping a segment end to a calibrated spring and loading roots until failure recording the maximum load. Files containing the tensile failure tests described in Schmidt and others (2001) include root diameter (mm), critical tensile load at failure (kg), root cross-sectional area (m^2), and tensile strength (MPa). Tensile strengths were calculated as: (critical tensile load at failure * gravitational acceleration)/root cross-sectional area. The files are labeled: Schmidt_2021_OR_root_AceCir.csv, Schmidt_2021_OR_root_AceMac.csv, Schmidt_2021_OR_root_AlnRub.csv, Schmidt_2021_OR_root_AnaMar.csv, Schmidt_2021_OR_root_DigPur.csv, Schmidt_2021_OR_root_MahNer.csv, Schmidt_2021_OR_root_PolMun.csv, Schmidt_2021_OR_root_PseMen_damaged.csv, Schmidt_2021_OR_root_PseMen_healthy.csv, Schmidt_2021_OR_root_RubDis.csv, Schmidt_2021_OR_root_RubPar.csv, Schmidt_2021_OR_root_SamCae.csv, and Schmidt_2021_OR_root_TsuHet.csv. File naming follows the convention of adopting the first three letters of the binomial system defining genus and species of their Latin names. Live and damaged roots were identified based on their color, texture, plasticity, adherence of bark to woody material, and compressibility. For example, healthy live Douglas-fir (Pseudotsuga menziesii) roots (Schmidt_2021_OR_root_PseMen_healthy.csv) have a crimson-colored inner bark, darkening to a brownish red in dead Douglas-fir roots. Both are distinctive colors. Live roots exhibited plastic responses to bending and strong adherence of bark, whereas dead roots displayed brittle behavior with bending and poor adherence of bark to the underlying woody material. Measured tensile strengths of damaged root threads with fungal infections following selective tree harvest using yarding operations that damaged bark of standing trees expressed significantly lower tensile strengths than their ultimate living tensile strengths (Schmidt_2021_OR_root_PseMen_damaged.csv). The CB1 site was clear cut logged in 1987 and replanted with Douglas fir saplings in 1989. Vegetation in the vicinity of the failure scarp is dominated by young Douglas fir saplings planted two years after the clear cut, blue elderberry (Sambucus caerulea), thimbleberry (Rubus parviflorus), foxglove (Digitalis purpurea), and Himalayan blackberry (Rubus discolor). The remaining seven species are provided for context of more regional studies. The CB1 site is a hillslope hollow that failed as a shallow landslide and mobilized as a debris flow during heavy rainfall in November 1996. Prior to debris flow mobilization, the ~5-m wide slide with a source area of roughly 860 m^2 and an average slope of 43° displaced and broke numerous roots. Following landsliding, field observations noted a preponderance of exposed, blunt broken root stubs within the scarp. Roots were not straight and smooth, but rather exhibited tortuous growth paths with firmly anchored, interlocking structures. The planform geometry represented by a tape and compass field survey is presented as starting and ending points of slide margin segments of roughly equal colluvial soil depths above saprolite or bedrock (Schmidt_2021_CB1_scarp_geometry.csv and Schmidt_2021_CB1_scarp_pts.shp). The graphic Schmidt_2021_CB1_scarp_pts_poly.png shows the horse-shoe shaped profile and its numbered scarp segments. Segment numbers enclosed within parentheses indicate segments where roots were not counted owing to occlusion by prior ground disturbance. The shapefile Schmidt_2021_CB1_scarp_poly.shp also represents the scarp line segments. The file Schmidt_2021_CB1_segment_info.csv presents the segment information as left and right cumulative lengths, averaged colluvium soils depths for each segment, and inclinations of the ground surface slope relative to horizontal along the perimeter (P) and the slide scarp face (F). Lastly, Schmidt_2021_CB1_rootdata_scarp.csv represents root diameter of individual threads measured by a micrometer, species, depth below ground surface, live vs. dead roots, general root orientation (parallel or perpendicular) relative to scarp perimeter, and cumulative perimeter distance within the scarp segments. At CB1 specifically and more generally across the Oregon Coast Range, root reinforcement occurs primarily by lateral reinforcement with typically much smaller basal reinforcements.
Facebook
TwitterTrain data of Riiid competition is a large dataset of over 100 million rows and 10 columns that does not fit into Kaggle Notebook's RAM using the default pandas read.csv resulting in a search for alternative approaches and formats.
Train data of Riiid competition in different formats.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
reading .CSV file for riiid completion took huge time and memory. This inspired me to convert .CSV in to different file format so that those can be loaded easily to Kaggle kernel.
Facebook
Twitterhttps://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the AIT CSV Import / Export technology, compiled through global website indexing conducted by WebTechSurvey.
Facebook
Twitterhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html
Replication pack, FSE2018 submission #164: ------------------------------------------
**Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: A Case Study of the PyPI Ecosystem **Note:** link to data artifacts is already included in the paper. Link to the code will be included in the Camera Ready version as well. Content description =================== - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files described below - **settings.py** - settings template for the code archive. - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset. This dataset only includes stats aggregated by the ecosystem (PyPI) - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages themselves, which take around 2TB. - **build_model.r, helpers.r** - R files to process the survival data (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, `common.cache/survival_data.pypi_2008_2017-12_6.csv` in **dataset_full_Jan_2018.tgz**) - **Interview protocol.pdf** - approximate protocol used for semistructured interviews. - LICENSE - text of GPL v3, under which this dataset is published - INSTALL.md - replication guide (~2 pages)
Replication guide ================= Step 0 - prerequisites ---------------------- - Unix-compatible OS (Linux or OS X) - Python interpreter (2.7 was used; Python 3 compatibility is highly likely) - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible) Depending on detalization level (see Step 2 for more details): - up to 2Tb of disk space (see Step 2 detalization levels) - at least 16Gb of RAM (64 preferable) - few hours to few month of processing time Step 1 - software ---------------- - unpack **ghd-0.1.0.zip**, or clone from gitlab: git clone https://gitlab.com/user2589/ghd.git git checkout 0.1.0 `cd` into the extracted folder. All commands below assume it as a current directory. - copy `settings.py` into the extracted folder. Edit the file: * set `DATASET_PATH` to some newly created folder path * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` - install docker. For Ubuntu Linux, the command is `sudo apt-get install docker-compose` - install libarchive and headers: `sudo apt-get install libarchive-dev` - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools` Without this dependency, you might get an error on the next step, but it's safe to ignore. - install Python libraries: `pip install --user -r requirements.txt` . - disable all APIs except GitHub (Bitbucket and Gitlab support were not yet implemented when this study was in progress): edit `scraper/init.py`, comment out everything except GitHub support in `PROVIDERS`. Step 2 - obtaining the dataset ----------------------------- The ultimate goal of this step is to get output of the Python function `common.utils.survival_data()` and save it into a CSV file: # copy and paste into a Python console from common import utils survival_data = utils.survival_data('pypi', '2008', smoothing=6) survival_data.to_csv('survival_data.csv') Since full replication will take several months, here are some ways to speedup the process: ####Option 2.a, difficulty level: easiest Just use the precomputed data. Step 1 is not necessary under this scenario. - extract **dataset_minimal_Jan_2018.zip** - get `survival_data.csv`, go to the next step ####Option 2.b, difficulty level: easy Use precomputed longitudinal feature values to build the final table. The whole process will take 15..30 minutes. - create a folder `
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
bob2
This dataset was automatically uploaded from the red-team-agent repository.
Dataset Information
Original file: bob2.csv Source path: /home/ubuntu/red-team-agent/bob2.csv Validation: Valid CSV with 6 rows, 5 columns (0.0MB)
Usage
import pandas as pd from datasets import load_dataset
dataset = load_dataset("aq1048576/bob2")
df = pd.read_csv("hf://datasets/aq1048576/bob2/data.csv")… See the full description on the dataset page: https://huggingface.co/datasets/aq1048576/bob2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.
Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".
In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.
import pandas as pd import json import numpy
with open("Linux_options.json","r") as f: linux_options = json.load(f)
return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})
Facebook
TwitterWe provide MATLAB binary files (.mat) and comma separated values files of data collected from a pilot study of a plug load management system that allows for the metering and control of individual electrical plug loads. The study included 15 power strips, each containing 4 channels (receptacles), which wirelessly transmitted power consumption data approximately once per second to 3 bridges. The bridges were connected to a building local area network which relayed data to a cloud-based service. Data were archived once per minute with the minimum, mean, and maximum power draw over each one minute interval recorded. The uncontrolled portion of the testing spanned approximately five weeks and established a baseline energy consumption. The controlled portion of the testing employed schedule-based rules for turning off selected loads during non-business hours; it also modified the energy saver policies for certain devices. Three folders are provided: “matFilesAllChOneDate” provides a MAT-file for each date, each file has all channels; “matFilesOneChAllDates” provides a MAT-file for each channel, each file has all dates; “csvFiles” provides comma separated values files for each date (note that because of data export size limitations, there are 10 csv files for each date). Each folder has the same data; there is no practical difference in content, only the way in which it is organized.
Facebook
TwitterAdapted OTTO datasets from JSON to CSV. Code: https://www.kaggle.com/code/adamnarozniak/full-otto-dataset-in-csv-for-pandas To load the data, use (to decrease needed memory): df = pd.read_csv(path, dtype={"session": np.uint32, "aid": np.uint32, "type": np.uint8}, parse_dates=["ts"])
The type was changed compared to the original data using this dictionary: type_dict = { 'clicks': 0, 'carts': 1, 'orders': 2 }
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The CSV version of SynD. This contains 22 CSV files.
Facebook
TwitterCsv Investments Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterThe GCEW herbicide data were collected from 1991-2010, and are documented at plot, field, and watershed scales. Atrazine concentrations in Goodwater Creek Experimental Watershed (GCEW) were shown to be among the highest of any watershed in the United States based on comparisons using the national Watershed Regressions for Pesticides (WARP) model and by direct comparison with the 112 watersheds used in the development of WARP. This 20-yr-long effort was augmented with a spatially broad effort within the Central Mississippi River Basin encompassing 12 related claypan watersheds in the Salt River Basin, two cave streams on the fringe of the Central Claypan Areas in the Bonne Femme watershed, and 95 streams in northern Missouri and southern Iowa. The research effort on herbicide transport has highlighted the importance of restrictive soil layers with smectitic mineralogy to the risk of transport vulnerability. Near-surface soil features, such as claypans and argillic horizons, result in greater herbicide transport than soils with high saturated hydraulic conductivities and low smectitic clay content. The data set contains concentration, load, and daily discharge data for Devils Icebox Cave and Hunters Cave from 1999 to 2002. The data are available in Microsoft Excel 2010 format. Sheet 1 (Cave Streams Metadata) contains supporting information regarding the length of record, site locations, parameters measured, parameter units, method detection limits, describes the meaning of zero and blank cells, and briefly describes unit area load computations. Sheet 2 (Devils Icebox Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Devils Icebox site for 12 analytes and two computed nutrient parameters. Sheet 3 (Devils Icebox SS Conc Data) contains 15-minute suspended sediment (SS) concentrations estimated from turbidity sensor data for the Devils Icebox site. Sheet 4 (Devils Icebox Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Devils Icebox site. Sheet 5 (Hunters Cave Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Hunters Cave site for 12 analytes and two computed nutrient parameters. Sheet 6 (Hunters Cave SS Conc Data) contains 15-minute SS concentrations estimated from turbidity sensor data for the Hunters Cave site. Sheet 7 (Hunters Cave Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Hunters Cave site. [Note: To support automated data access and processing, each worksheet has been extracted as a separate, machine-readable CSV file; see Data Dictionary for descriptions of variables and their concentration units.] Resources in this dataset:Resource Title: README - Metadata. File Name: LTAR_GCEW_herbicidewater_qual.xlsxResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as rendered in the Excel file. For additional information including site information, method detection limits, and methods citations, see Metadata tab. For Definitions used in machine-readable CSV files, see Data Dictionary.Resource Title: Excel data spreadsheet. File Name: c3.jeq2013.12.0516.ds1_.xlsxResource Description: Multi-page data spreadsheet containing data as well as metadata from this study. A direct download of the data spreadsheet can be found here: https://dl.sciencesocieties.org/publications/datasets/jeq/C3.JEQ2013.12.0516.ds1/downloadResource Title: Devils Icebox Concentration Data. File Name: DevilsIceboxConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data).Resource Title: Devils Icebox Load and Discharge Data. File Name: DevilsIceboxLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Devils Icebox Suspended Sediment Concentration Data. File Name: DevilsIceboxSSConcData.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Load and Discharge Data. File Name: HuntersCaveLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Suspended Sediment Concentration Data. File Name: HuntersCaveSSConc.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Data Dictionary for machine-readable CSV files. File Name: LTAR_GCEW_herbicidewater_qual.csvResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as implemented in the extracted machine-readable CSV files.Resource Title: Hunters Cave Concentration Data. File Name: HuntersCaveConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains .csv files.The data contains load and generation time series for all the 10 kV or 400 V nodes in the network. Load and Generation time-series data:Load time-series> active and reactive power at 1 hour resolution> aggregated time-series at 60 kV-10 kV substation> individual load time-series at 10 kV or 400 V nodes> 27 different load profiles grouped in to household, commercial, agricultural and miscellaneous Generation time-series> active power at 1 hour resolution> Wind and solar generation time-series from meteorological dataThis item is a part of the collection, 'DTU 7k-Bus Active Distribution Network'https://doi.org/10.11583/DTU.c.5389910For more information, access the readme file: https://doi.org/10.11583/DTU.14971812
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.