100+ datasets found
  1. Z

    Data pipeline Validation And Load Testing using Multiple CSV Files

    • data.niaid.nih.gov
    • data.europa.eu
    Updated Mar 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797
    Explore at:
    Dataset updated
    Mar 26, 2021
    Dataset provided by
    Research Fellow, University of Tartu
    Lecturer, University of Tartu
    Masters Student, University of Tartu
    Authors
    Mainak Adhikari; Afsana Khan; Pelle Jakovits
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

  2. CSV-data-load_data_metadata

    • kaggle.com
    zip
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DINESH JATAV (2023). CSV-data-load_data_metadata [Dataset]. https://www.kaggle.com/datasets/dineshjatav/csv-data-load-data-metadata
    Explore at:
    zip(23415375 bytes)Available download formats
    Dataset updated
    Jun 7, 2023
    Authors
    DINESH JATAV
    Description

    Dataset

    This dataset was created by DINESH JATAV

    Contents

  3. w

    Websites using Import Users From Csv With Meta

    • webtechsurvey.com
    csv
    Updated Nov 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using Import Users From Csv With Meta [Dataset]. https://webtechsurvey.com/technology/import-users-from-csv-with-meta
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 23, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the Import Users From Csv With Meta technology, compiled through global website indexing conducted by WebTechSurvey.

  4. H

    CsvReader

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao HU (2025). CsvReader [Dataset]. http://doi.org/10.7910/DVN/XT2MWH
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Tao HU
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The CsvReader is a component designed to read and process CSV (Comma-Separated Values) files, which are widely used for storing tabular data. This component can be used to load CSV files, perform operations like filtering and aggregation, and then output the results. It is a valuable tool for data preprocessing in various workflows, including data analysis and machine learning pipelines.

  5. Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv, zip
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
    Explore at:
    bin, zip, csvAvailable download formats
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    • The data is licensed through the Creative Commons Attribution 4.0 International.
    • If you have used our data and are publishing your work, we ask that you please reference both:
      1. this database through its DOI, and
      2. any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    • Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
    • Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
    • Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
      • Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
      • We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Clean_Data_v1-0-0.zip: contains all the downsampled data
      • The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
      • There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
      • The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
    • Database_References_v1-0-0.bib
      • Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    • The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
    • Time[s]: time in seconds since the start of the test
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas
    data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    • The first column is the index of each data point
    • S/No: sample number recorded by the DAQ
    • System Date: Date and time of sample
    • Time[s]: time in seconds since the start of the test
    • C_1_Force[kN]: load cell force
    • C_1_Déform1[mm]: extensometer displacement
    • C_1_Déplacement[mm]: cross-head displacement
    • Eng_Stress[MPa]: engineering stress
    • Eng_Strain[]: engineering strain
    • e_true: true strain
    • Sigma_true: true stress in MPa
    • (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    • hidden_index: internal reference ID
    • grade: material grade
    • spec: specifications for the material
    • source: base material for the test specimen
    • id: internal name for the specimen
    • lp: load protocol
    • size: type of specimen (M8, M12, M20)
    • gage_length_mm_: unreduced section length in mm
    • avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
    • avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
    • avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
    • fy_n_mpa_: nominal yield stress
    • fu_n_mpa_: nominal ultimate stress
    • t_a_deg_c_: ambient temperature in degC
    • date: date of test
    • investigator: person(s) who conducted the test
    • location: laboratory where test was conducted
    • machine: setup used to conduct test
    • pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
    • pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
    • pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
    • citekey: reference corresponding to the Database_References.bib file
    • yield_stress_mpa_: computed yield stress in MPa
    • elastic_modulus_mpa_: computed elastic modulus in MPa
    • fracture_strain: computed average true strain across the fracture surface
    • c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
    • file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv',
              index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1],
              keep_default_na=False, na_values='')
    • citekey: reference in "Campaign_References.bib".
    • Grade: material grade.
    • Spec.: specifications (e.g., J2+N).
    • Yield Stress [MPa]: initial yield stress in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
    • Elastic Modulus [MPa]: initial elastic modulus in MPa
      • size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    • The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
      • A500
      • A992_Gr50
      • BCP325
      • BCR295
      • HYP400
      • S460NL
      • S690QL/25mm
      • S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
  6. h

    my-awesome-dataset

    • huggingface.co
    Updated Jul 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthew Kehoe (2024). my-awesome-dataset [Dataset]. https://huggingface.co/datasets/Axion004/my-awesome-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 31, 2024
    Authors
    Matthew Kehoe
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Demo

      Dataset Summary
    

    This is a demo dataset with two files train.csv and test.csv. Load it by: from datasets import load_dataset data_files = {"train": "train.csv", "test": "test.csv"} demo = load_dataset("stevhliu/demo", data_files=data_files)

      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]

      Dataset Structure
    
    
    
    
    
    
    
      Data Instances
    

    [More Information… See the full description on the dataset page: https://huggingface.co/datasets/Axion004/my-awesome-dataset.

  7. Data Mining Project - Boston

    • kaggle.com
    zip
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
    Explore at:
    zip(59313797 bytes)Available download formats
    Dataset updated
    Nov 25, 2019
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  8. h

    doc-formats-csv-1

    • huggingface.co
    Updated Nov 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasets examples (2023). doc-formats-csv-1 [Dataset]. https://huggingface.co/datasets/datasets-examples/doc-formats-csv-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    Datasets examples
    Description

    [doc] formats - csv - 1

    This dataset contains one csv file at the root:

    data.csv

    kind,sound dog,woof cat,meow pokemon,pika human,hello

    The YAML section of the README does not contain anything related to loading the data (only the size category metadata):

    size_categories:

    - n<1K

  9. d

    Data from: Root thread strength, landslide headscarp geometry, and observed...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Root thread strength, landslide headscarp geometry, and observed root characteristics at the monitored CB1 landslide, Oregon, USA [Dataset]. https://catalog.data.gov/dataset/root-thread-strength-landslide-headscarp-geometry-and-observed-root-characteristics-at-the
    Explore at:
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    Oregon, United States
    Description

    This data release supports interpretations of field-observed root distributions within a shallow landslide headscarp (CB1) located below Mettman Ridge within the Oregon Coast Range, approximately 15 km northeast of Coos Bay, Oregon, USA. (Schmidt_2021_CB1_topo_far.png and Schmidt_2021_CB1_topo_close.png). Root species, diameter (greater than or equal to 1 mm), general orientation relative to the slide scarp, and depth below ground surface were characterized immediately following landsliding in response to large-magnitude precipitation in November 1996 which triggered thousands of landslides within the area (Montgomery and others, 2009). The enclosed data includes: (1) tests of root-thread failure as a function of root diameter and tensile load for different plant species applicable to the broader Oregon Coast Range and (2) tape and compass survey of the planform geometry of the CB1 landslide and the roots observed in the slide scarp. Root diameter and load measurements were principally collected in the general area of the CB1 slide for 12 species listed in: Schmidt_2021_OR_root_species_list.csv. Methodology of the failure tests included identifying roots of a given plant species, trimming root threads into 15-20 cm long segments, measuring diameters including bark (up to 6.5 mm) with a micrometer at multiple points along the segment to arrive at an average, clamping a segment end to a calibrated spring and loading roots until failure recording the maximum load. Files containing the tensile failure tests described in Schmidt and others (2001) include root diameter (mm), critical tensile load at failure (kg), root cross-sectional area (m^2), and tensile strength (MPa). Tensile strengths were calculated as: (critical tensile load at failure * gravitational acceleration)/root cross-sectional area. The files are labeled: Schmidt_2021_OR_root_AceCir.csv, Schmidt_2021_OR_root_AceMac.csv, Schmidt_2021_OR_root_AlnRub.csv, Schmidt_2021_OR_root_AnaMar.csv, Schmidt_2021_OR_root_DigPur.csv, Schmidt_2021_OR_root_MahNer.csv, Schmidt_2021_OR_root_PolMun.csv, Schmidt_2021_OR_root_PseMen_damaged.csv, Schmidt_2021_OR_root_PseMen_healthy.csv, Schmidt_2021_OR_root_RubDis.csv, Schmidt_2021_OR_root_RubPar.csv, Schmidt_2021_OR_root_SamCae.csv, and Schmidt_2021_OR_root_TsuHet.csv. File naming follows the convention of adopting the first three letters of the binomial system defining genus and species of their Latin names. Live and damaged roots were identified based on their color, texture, plasticity, adherence of bark to woody material, and compressibility. For example, healthy live Douglas-fir (Pseudotsuga menziesii) roots (Schmidt_2021_OR_root_PseMen_healthy.csv) have a crimson-colored inner bark, darkening to a brownish red in dead Douglas-fir roots. Both are distinctive colors. Live roots exhibited plastic responses to bending and strong adherence of bark, whereas dead roots displayed brittle behavior with bending and poor adherence of bark to the underlying woody material. Measured tensile strengths of damaged root threads with fungal infections following selective tree harvest using yarding operations that damaged bark of standing trees expressed significantly lower tensile strengths than their ultimate living tensile strengths (Schmidt_2021_OR_root_PseMen_damaged.csv). The CB1 site was clear cut logged in 1987 and replanted with Douglas fir saplings in 1989. Vegetation in the vicinity of the failure scarp is dominated by young Douglas fir saplings planted two years after the clear cut, blue elderberry (Sambucus caerulea), thimbleberry (Rubus parviflorus), foxglove (Digitalis purpurea), and Himalayan blackberry (Rubus discolor). The remaining seven species are provided for context of more regional studies. The CB1 site is a hillslope hollow that failed as a shallow landslide and mobilized as a debris flow during heavy rainfall in November 1996. Prior to debris flow mobilization, the ~5-m wide slide with a source area of roughly 860 m^2 and an average slope of 43° displaced and broke numerous roots. Following landsliding, field observations noted a preponderance of exposed, blunt broken root stubs within the scarp. Roots were not straight and smooth, but rather exhibited tortuous growth paths with firmly anchored, interlocking structures. The planform geometry represented by a tape and compass field survey is presented as starting and ending points of slide margin segments of roughly equal colluvial soil depths above saprolite or bedrock (Schmidt_2021_CB1_scarp_geometry.csv and Schmidt_2021_CB1_scarp_pts.shp). The graphic Schmidt_2021_CB1_scarp_pts_poly.png shows the horse-shoe shaped profile and its numbered scarp segments. Segment numbers enclosed within parentheses indicate segments where roots were not counted owing to occlusion by prior ground disturbance. The shapefile Schmidt_2021_CB1_scarp_poly.shp also represents the scarp line segments. The file Schmidt_2021_CB1_segment_info.csv presents the segment information as left and right cumulative lengths, averaged colluvium soils depths for each segment, and inclinations of the ground surface slope relative to horizontal along the perimeter (P) and the slide scarp face (F). Lastly, Schmidt_2021_CB1_rootdata_scarp.csv represents root diameter of individual threads measured by a micrometer, species, depth below ground surface, live vs. dead roots, general root orientation (parallel or perpendicular) relative to scarp perimeter, and cumulative perimeter distance within the scarp segments. At CB1 specifically and more generally across the Oregon Coast Range, root reinforcement occurs primarily by lateral reinforcement with typically much smaller basal reinforcements.

  10. riiid_train_converted to Multiple Formats

    • kaggle.com
    zip
    Updated Jun 2, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santh Raul (2021). riiid_train_converted to Multiple Formats [Dataset]. https://www.kaggle.com/santhraul/riiid-train-converted-to-multiple-formats
    Explore at:
    zip(4473461912 bytes)Available download formats
    Dataset updated
    Jun 2, 2021
    Authors
    Santh Raul
    Description

    Context

    Train data of Riiid competition is a large dataset of over 100 million rows and 10 columns that does not fit into Kaggle Notebook's RAM using the default pandas read.csv resulting in a search for alternative approaches and formats.

    Content

    Train data of Riiid competition in different formats.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    reading .CSV file for riiid completion took huge time and memory. This inspired me to convert .CSV in to different file format so that those can be loaded easily to Kaggle kernel.

  11. w

    Websites using AIT CSV Import / Export

    • webtechsurvey.com
    csv
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WebTechSurvey (2025). Websites using AIT CSV Import / Export [Dataset]. https://webtechsurvey.com/technology/ait-csv-import-export-wordpress-plugin
    Explore at:
    csvAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset authored and provided by
    WebTechSurvey
    License

    https://webtechsurvey.com/termshttps://webtechsurvey.com/terms

    Time period covered
    2025
    Area covered
    Global
    Description

    A complete list of live websites using the AIT CSV Import / Export technology, compiled through global website indexing conducted by WebTechSurvey.

  12. Data from: Ecosystem-Level Determinants of Sustained Activity in Open-Source...

    • zenodo.org
    application/gzip, bin +2
    Updated Aug 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb (2024). Ecosystem-Level Determinants of Sustained Activity in Open-Source Projects: A Case Study of the PyPI Ecosystem [Dataset]. http://doi.org/10.5281/zenodo.1419788
    Explore at:
    bin, application/gzip, zip, text/x-pythonAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marat Valiev; Marat Valiev; Bogdan Vasilescu; James Herbsleb; Bogdan Vasilescu; James Herbsleb
    License

    https://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.htmlhttps://www.gnu.org/licenses/old-licenses/gpl-2.0-standalone.html

    Description
    Replication pack, FSE2018 submission #164:
    ------------------------------------------
    
    **Working title:** Ecosystem-Level Factors Affecting the Survival of Open-Source Projects: 
    A Case Study of the PyPI Ecosystem
    
    **Note:** link to data artifacts is already included in the paper. 
    Link to the code will be included in the Camera Ready version as well.
    
    
    Content description
    ===================
    
    - **ghd-0.1.0.zip** - the code archive. This code produces the dataset files 
     described below
    - **settings.py** - settings template for the code archive.
    - **dataset_minimal_Jan_2018.zip** - the minimally sufficient version of the dataset.
     This dataset only includes stats aggregated by the ecosystem (PyPI)
    - **dataset_full_Jan_2018.tgz** - full version of the dataset, including project-level
     statistics. It is ~34Gb unpacked. This dataset still doesn't include PyPI packages
     themselves, which take around 2TB.
    - **build_model.r, helpers.r** - R files to process the survival data 
      (`survival_data.csv` in **dataset_minimal_Jan_2018.zip**, 
      `common.cache/survival_data.pypi_2008_2017-12_6.csv` in 
      **dataset_full_Jan_2018.tgz**)
    - **Interview protocol.pdf** - approximate protocol used for semistructured interviews.
    - LICENSE - text of GPL v3, under which this dataset is published
    - INSTALL.md - replication guide (~2 pages)
    Replication guide
    =================
    
    Step 0 - prerequisites
    ----------------------
    
    - Unix-compatible OS (Linux or OS X)
    - Python interpreter (2.7 was used; Python 3 compatibility is highly likely)
    - R 3.4 or higher (3.4.4 was used, 3.2 is known to be incompatible)
    
    Depending on detalization level (see Step 2 for more details):
    - up to 2Tb of disk space (see Step 2 detalization levels)
    - at least 16Gb of RAM (64 preferable)
    - few hours to few month of processing time
    
    Step 1 - software
    ----------------
    
    - unpack **ghd-0.1.0.zip**, or clone from gitlab:
    
       git clone https://gitlab.com/user2589/ghd.git
       git checkout 0.1.0
     
     `cd` into the extracted folder. 
     All commands below assume it as a current directory.
      
    - copy `settings.py` into the extracted folder. Edit the file:
      * set `DATASET_PATH` to some newly created folder path
      * add at least one GitHub API token to `SCRAPER_GITHUB_API_TOKENS` 
    - install docker. For Ubuntu Linux, the command is 
      `sudo apt-get install docker-compose`
    - install libarchive and headers: `sudo apt-get install libarchive-dev`
    - (optional) to replicate on NPM, install yajl: `sudo apt-get install yajl-tools`
     Without this dependency, you might get an error on the next step, 
     but it's safe to ignore.
    - install Python libraries: `pip install --user -r requirements.txt` . 
    - disable all APIs except GitHub (Bitbucket and Gitlab support were
     not yet implemented when this study was in progress): edit
     `scraper/init.py`, comment out everything except GitHub support
     in `PROVIDERS`.
    
    Step 2 - obtaining the dataset
    -----------------------------
    
    The ultimate goal of this step is to get output of the Python function 
    `common.utils.survival_data()` and save it into a CSV file:
    
      # copy and paste into a Python console
      from common import utils
      survival_data = utils.survival_data('pypi', '2008', smoothing=6)
      survival_data.to_csv('survival_data.csv')
    
    Since full replication will take several months, here are some ways to speedup
    the process:
    
    ####Option 2.a, difficulty level: easiest
    
    Just use the precomputed data. Step 1 is not necessary under this scenario.
    
    - extract **dataset_minimal_Jan_2018.zip**
    - get `survival_data.csv`, go to the next step
    
    ####Option 2.b, difficulty level: easy
    
    Use precomputed longitudinal feature values to build the final table.
    The whole process will take 15..30 minutes.
    
    - create a folder `
  13. h

    bob2

    • huggingface.co
    Updated Sep 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A Q (2025). bob2 [Dataset]. https://huggingface.co/datasets/aq1048576/bob2
    Explore at:
    Dataset updated
    Sep 19, 2025
    Authors
    A Q
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    bob2

    This dataset was automatically uploaded from the red-team-agent repository.

      Dataset Information
    

    Original file: bob2.csv Source path: /home/ubuntu/red-team-agent/bob2.csv Validation: Valid CSV with 6 rows, 5 columns (0.0MB)

      Usage
    

    import pandas as pd from datasets import load_dataset

    Load using datasets library

    dataset = load_dataset("aq1048576/bob2")

    Or load directly with pandas

    df = pd.read_csv("hf://datasets/aq1048576/bob2/data.csv")… See the full description on the dataset page: https://huggingface.co/datasets/aq1048576/bob2.

  14. Z

    Linux Kernel binary size

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jun 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hugo MARTIN; Mathieu ACHER (2021). Linux Kernel binary size [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4943883
    Explore at:
    Dataset updated
    Jun 14, 2021
    Dataset provided by
    IRISA
    Authors
    Hugo MARTIN; Mathieu ACHER
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset containing measurements of Linux Kernel binary size after compilation. The reported size, in the column "perf", is the size in bytes of the vmlinux file. In contains also a column "active_options" reporting the number of activated options (set at "y"). All other columns, the list being reported in the file "Linux_options.json", are Linux kernel options. The sampling have been made using randconfig. The version of Linux used is 4.13.3.

    Not all available options are present. First, it only contains options about the x86 and 64 bits version. Then, all non-tristate options have been ignored. Finally, options not having multiple value through the whole dataset, due to not enough variability in the sampling, are ignored. All options are encoded as 0 for "n" and "m" options value, and 1 for "y".

    In python, importing the dataset using pandas will attribute all columns to int64, which will lead to a great consumption of memory (~50GB). We provide this way to import it using less than 1 GB of memory by setting options columns to int8.

    import pandas as pd import json import numpy

    with open("Linux_options.json","r") as f: linux_options = json.load(f)

    Load csv by setting options as int8 to save a lot of memory

    return pd.read_csv("Linux.csv", dtype={f:numpy.int8 for f in linux_options})

  15. Plug Load Data - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Plug Load Data - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/plug-load-data
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    We provide MATLAB binary files (.mat) and comma separated values files of data collected from a pilot study of a plug load management system that allows for the metering and control of individual electrical plug loads. The study included 15 power strips, each containing 4 channels (receptacles), which wirelessly transmitted power consumption data approximately once per second to 3 bridges. The bridges were connected to a building local area network which relayed data to a cloud-based service. Data were archived once per minute with the minimum, mean, and maximum power draw over each one minute interval recorded. The uncontrolled portion of the testing spanned approximately five weeks and established a baseline energy consumption. The controlled portion of the testing employed schedule-based rules for turning off selected loads during non-business hours; it also modified the energy saver policies for certain devices. Three folders are provided: “matFilesAllChOneDate” provides a MAT-file for each date, each file has all channels; “matFilesOneChAllDates” provides a MAT-file for each channel, each file has all dates; “csvFiles” provides comma separated values files for each date (note that because of data export size limitations, there are 10 csv files for each date). Each folder has the same data; there is no practical difference in content, only the way in which it is organized.

  16. Full OTTO Dataset in CSV

    • kaggle.com
    Updated Nov 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adam (2022). Full OTTO Dataset in CSV [Dataset]. https://www.kaggle.com/datasets/adamnarozniak/full-otto-dataset-in-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 5, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Adam
    Description

    Adapted OTTO datasets from JSON to CSV. Code: https://www.kaggle.com/code/adamnarozniak/full-otto-dataset-in-csv-for-pandas To load the data, use (to decrease needed memory): df = pd.read_csv(path, dtype={"session": np.uint32, "aid": np.uint32, "type": np.uint8}, parse_dates=["ts"])

    The type was changed compared to the original data using this dictionary: type_dict = { 'clicks': 0, 'carts': 1, 'orders': 2 }

  17. SynD - CSV Version

    • springernature.figshare.com
    bin
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christoph Klemenjak; Christoph Kovatsch; Manuel Herold; Wilfried Elmenreich (2023). SynD - CSV Version [Dataset]. http://doi.org/10.6084/m9.figshare.10071248.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Christoph Klemenjak; Christoph Kovatsch; Manuel Herold; Wilfried Elmenreich
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The CSV version of SynD. This contains 22 CSV files.

  18. e

    Csv Investments Private Limited Export Import Data | Eximpedia

    • eximpedia.app
    Updated Oct 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Csv Investments Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/csv-investments-private-limited/54413494
    Explore at:
    Dataset updated
    Oct 8, 2025
    Description

    Csv Investments Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  19. d

    Data from: Long-Term Agroecosystem Research in the Central Mississippi River...

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +2more
    Updated Dec 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Long-Term Agroecosystem Research in the Central Mississippi River Basin: Goodwater Creek Experimental Watershed and Regional Herbicide Water Quality Data [Dataset]. https://catalog.data.gov/dataset/data-from-long-term-agroecosystem-research-in-the-central-mississippi-river-basin-goodwate-a5df5
    Explore at:
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Agricultural Research Service
    Area covered
    Mississippi River System, Mississippi River
    Description

    The GCEW herbicide data were collected from 1991-2010, and are documented at plot, field, and watershed scales. Atrazine concentrations in Goodwater Creek Experimental Watershed (GCEW) were shown to be among the highest of any watershed in the United States based on comparisons using the national Watershed Regressions for Pesticides (WARP) model and by direct comparison with the 112 watersheds used in the development of WARP. This 20-yr-long effort was augmented with a spatially broad effort within the Central Mississippi River Basin encompassing 12 related claypan watersheds in the Salt River Basin, two cave streams on the fringe of the Central Claypan Areas in the Bonne Femme watershed, and 95 streams in northern Missouri and southern Iowa. The research effort on herbicide transport has highlighted the importance of restrictive soil layers with smectitic mineralogy to the risk of transport vulnerability. Near-surface soil features, such as claypans and argillic horizons, result in greater herbicide transport than soils with high saturated hydraulic conductivities and low smectitic clay content. The data set contains concentration, load, and daily discharge data for Devils Icebox Cave and Hunters Cave from 1999 to 2002. The data are available in Microsoft Excel 2010 format. Sheet 1 (Cave Streams Metadata) contains supporting information regarding the length of record, site locations, parameters measured, parameter units, method detection limits, describes the meaning of zero and blank cells, and briefly describes unit area load computations. Sheet 2 (Devils Icebox Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Devils Icebox site for 12 analytes and two computed nutrient parameters. Sheet 3 (Devils Icebox SS Conc Data) contains 15-minute suspended sediment (SS) concentrations estimated from turbidity sensor data for the Devils Icebox site. Sheet 4 (Devils Icebox Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Devils Icebox site. Sheet 5 (Hunters Cave Concentration Data) contains concentration data from all samples collected from 1999 to 2002 at the Hunters Cave site for 12 analytes and two computed nutrient parameters. Sheet 6 (Hunters Cave SS Conc Data) contains 15-minute SS concentrations estimated from turbidity sensor data for the Hunters Cave site. Sheet 7 (Hunters Cave Load & Discharge Data) contains daily data for discharge, load, and unit area loads for the Hunters Cave site. [Note: To support automated data access and processing, each worksheet has been extracted as a separate, machine-readable CSV file; see Data Dictionary for descriptions of variables and their concentration units.] Resources in this dataset:Resource Title: README - Metadata. File Name: LTAR_GCEW_herbicidewater_qual.xlsxResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as rendered in the Excel file. For additional information including site information, method detection limits, and methods citations, see Metadata tab. For Definitions used in machine-readable CSV files, see Data Dictionary.Resource Title: Excel data spreadsheet. File Name: c3.jeq2013.12.0516.ds1_.xlsxResource Description: Multi-page data spreadsheet containing data as well as metadata from this study. A direct download of the data spreadsheet can be found here: https://dl.sciencesocieties.org/publications/datasets/jeq/C3.JEQ2013.12.0516.ds1/downloadResource Title: Devils Icebox Concentration Data. File Name: DevilsIceboxConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data).Resource Title: Devils Icebox Load and Discharge Data. File Name: DevilsIceboxLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Devils Icebox Suspended Sediment Concentration Data. File Name: DevilsIceboxSSConcData.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Load and Discharge Data. File Name: HuntersCaveLoad&Discharge.csvResource Description: Discharge and Unit Area Loads for herbicides, metabolites, and suspended sediments (extracted from Excel tab as machine-readable CSV data)Resource Title: Hunters Cave Suspended Sediment Concentration Data. File Name: HuntersCaveSSConc.csvResource Description: Suspended Sediment Concentration Data (extracted from Excel tab as machine-readable CSV data)Resource Title: Data Dictionary for machine-readable CSV files. File Name: LTAR_GCEW_herbicidewater_qual.csvResource Description: Defines Water Quality and Sediment Load/Discharge parameters, abbreviations, time-frames, and units as implemented in the extracted machine-readable CSV files.Resource Title: Hunters Cave Concentration Data. File Name: HuntersCaveConcData.csvResource Description: Concentrations of herbicides, metabolites, and nutrients (extracted from the Excel tab into machine-readable CSV data)

  20. d

    Generation and Load time-series data for 10kV-400V networks

    • data.dtu.dk
    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aeishwarya Baviskar; Matti Juhani Koivisto; Kaushik Das; Anca Daniela Hansen (2023). Generation and Load time-series data for 10kV-400V networks [Dataset]. http://doi.org/10.11583/DTU.14604765
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Technical University of Denmark
    Authors
    Aeishwarya Baviskar; Matti Juhani Koivisto; Kaushik Das; Anca Daniela Hansen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains .csv files.The data contains load and generation time series for all the 10 kV or 400 V nodes in the network. Load and Generation time-series data:Load time-series> active and reactive power at 1 hour resolution> aggregated time-series at 60 kV-10 kV substation> individual load time-series at 10 kV or 400 V nodes> 27 different load profiles grouped in to household, commercial, agricultural and miscellaneous Generation time-series> active power at 1 hour resolution> Wind and solar generation time-series from meteorological dataThis item is a part of the collection, 'DTU 7k-Bus Active Distribution Network'https://doi.org/10.11583/DTU.c.5389910For more information, access the readme file: https://doi.org/10.11583/DTU.14971812

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mainak Adhikari; Afsana Khan; Pelle Jakovits (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797

Data pipeline Validation And Load Testing using Multiple CSV Files

Explore at:
Dataset updated
Mar 26, 2021
Dataset provided by
Research Fellow, University of Tartu
Lecturer, University of Tartu
Masters Student, University of Tartu
Authors
Mainak Adhikari; Afsana Khan; Pelle Jakovits
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.

Search
Clear search
Close search
Google apps
Main menu