14 datasets found
  1. New York Taxi Data 2009-2016 in Parquet Fomat

    • academictorrents.com
    bittorrent
    Updated Jul 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Taxi and Limousine Commission (2017). New York Taxi Data 2009-2016 in Parquet Fomat [Dataset]. https://academictorrents.com/details/4f465810b86c6b793d1c7556fe3936441081992e
    Explore at:
    bittorrent(35078948106)Available download formats
    Dataset updated
    Jul 1, 2017
    Dataset provided by
    New York City Taxi and Limousine Commissionhttp://www.nyc.gov/tlc
    Authors
    New York Taxi and Limousine Commission
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Area covered
    New York
    Description

    Trip record data from the Taxi and Limousine Commission () from January 2009-December 2016 was consolidated and brought into a consistent Parquet format by Ravi Shekhar

  2. Surface Water - Habitat Results

    • catalog.data.gov
    • datasets.ai
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2024). Surface Water - Habitat Results [Dataset]. https://catalog.data.gov/dataset/surface-water-habitat-results
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California State Water Resources Control Board
    Description

    This data provides results from field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result. Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here. Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool

  3. Surface Water - Habitat Results

    • data.cnra.ca.gov
    • data.ca.gov
    csv, pdf, zip
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2025). Surface Water - Habitat Results [Dataset]. https://data.cnra.ca.gov/dataset/surface-water-habitat-results
    Explore at:
    pdf, zip, csvAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    California State Water Resources Control Board
    Description

    This data provides results from field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

    Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data.

    Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.

  4. Surface Water - Benthic Macroinvertebrate Results

    • data.cnra.ca.gov
    • data.ca.gov
    csv, pdf, zip
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2025). Surface Water - Benthic Macroinvertebrate Results [Dataset]. https://data.cnra.ca.gov/dataset/surface-water-benthic-macroinvertebrate-results
    Explore at:
    pdf, zip, csvAvailable download formats
    Dataset updated
    Jun 3, 2025
    Dataset authored and provided by
    California State Water Resources Control Board
    Description

    Data collected for marine benthic infauna, freshwater benthic macroinvertebrate (BMI), algae, bacteria and diatom taxonomic analyses, from the California Environmental Data Exchange Network (CEDEN). Note bacteria single species concentrations are stored within the chemistry template, whereas abundance bacteria are stored within this set. Each record represents a result from a specific event location for a single organism in a single sample.

    The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

    Zip files are provided for bulk data downloads (in csv or parquet file format), and developers can use the API associated with the "CEDEN Benthic Data" (csv) resource to access the data.

    Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.

  5. a

    LAION-400-MILLION OPEN DATASET

    • academictorrents.com
    bittorrent
    Updated Sep 14, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2021). LAION-400-MILLION OPEN DATASET [Dataset]. https://academictorrents.com/details/34b94abbcefef5a240358b9acd7920c8b675aacc
    Explore at:
    bittorrent(1211103363514)Available download formats
    Dataset updated
    Sep 14, 2021
    Authors
    None
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LAION-400M The world’s largest openly available image-text-pair dataset with 400 million samples. # Concept and Content The LAION-400M dataset is completely openly, freely accessible. All images and texts in the LAION-400M dataset have been filtered with OpenAI‘s CLIP by calculating the cosine similarity between the text and image embeddings and dropping those with a similarity below 0.3 The threshold of 0.3 had been determined through human evaluations and seems to be a good heuristic for estimating semantic image-text-content matching. The image-text-pairs have been extracted from the Common Crawl web data dump and are from random web pages crawled between 2014 and 2021. # Download Information You can find The CLIP image embeddings (NumPy files) The parquet files KNN index of image embeddings # LAION-400M Dataset Statistics The LAION-400M and future even bigger ones are in fact datasets of datasets. For instance, it can be filtered out by image sizes into smaller datasets like th

  6. Z

    Data from: MASCDB, a database of images, descriptors and microphysical...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grazioli, Jacopo (2023). MASCDB, a database of images, descriptors and microphysical properties of individual snowflakes in free fall [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_5578920
    Explore at:
    Dataset updated
    Jul 5, 2023
    Dataset provided by
    Ghiggi, Gionata
    Grazioli, Jacopo
    Berne, Alexis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset overview

    This dataset provides data and images of snowflakes in free fall collected with a Multi-Angle Snowflake Camera (MASC) The dataset includes, for each recorded snowflakes:

    A triplet of gray-scale images corresponding to the three cameras of the MASC

    A large quantity of geometrical, textural descriptors and the pre-compiled output of published retrieval algorithms as well as basic environmental information at the location and time of each measurement.

    The pre-computed descriptors and retrievals are available either individually for each camera view or, some of them, available as descriptors of the triplet as a whole. A non exhaustive list of precomputed quantities includes for example:

    Textural and geometrical descriptors as in Praz et al 2017

    Hydrometeor classification, riming degree estimation, melting identification, as in Praz et al 2017

    Blowing snow identification, as in Schaer et al 2020

    Mass, volume, gyration estimation, as in Leinonen et al 2021

    Data format and structure

    The dataset is divided into four .parquet file (for scalar descriptors) and a Zarr database (for the images). A detailed description of the data content and of the data records is available here.

    Supporting code

    A python-based API is available to manipulate, display and organize the data of our dataset. It can be found on GitHub. See also the code documentation on ReadTheDocs.

    Download notes

    All files available here for download should be stored in the same folder, if the python-based API is used

    MASCdb.zarr.zip must be unzipped after download

    Field campaigns

    A list of campaigns included in the dataset, with a minimal description is given in the following table

        Campaign_name
        Information
    

    Shielded / Not shielded

    DFIR = Double Fence Intercomparison Reference

    APRES3-2016 & APRES3-2017

        Instrument installed in Antarctica in the context of the APRES3 project. See for example Genthon et al, 2018 or Grazioli et al 2017
        Not shielded
    
    
        Davos-2015
        Instrument installed in the Swiss Alps within the context of SPICE (Solid Precipitation InterComparison Experiment)
        Shielded (DFIR)
    
    
        Davos-2019
        Instrument installed in the Swiss Alps within the context of RACLETS (Role of Aerosols and CLouds Enhanced by Topography on Snow)
        Not shielded
    
    
        ICEGENESIS-2021
        Instrument installed in the Swiss Jura in a MeteoSwiss ground measurement site, within the context of ICE-GENESIS. See for example Billault-Roux et al, 2023
        Not shielded
    
    
        ICEPOP-2018
        Instrument installed in Korea, in the context of ICEPOP. See for example Gehring et al 2021.
        Shielded (DFIR)
    
    
        Jura-2019 & Jura-2023
        Instrument installed in the Swiss Jura within a MeteoSwiss measurement site
        Not shielded
    
    
        Norway-2016
        Instrument installed in Norway during the High-Latitude Measurement of Snowfall (HiLaMS). See for example Cooper et al, 2022.
        Not shielded
    
    
        PLATO-2019
        Instrument installed in the "Davis" Antarctic base during the PLATO field campaign
        Not shielded
    
    
        POPE-2020
        Instrument installed in the "Princess Elizabeth Antarctica" base during the POPE campaign. See for example Ferrone et al, 2023.
        Not shielded
    
    
        Remoray-2022
        Instrument installed in the French Jura.
        Not shielded
    
    
        Valais-2016
        Instrument installed in the Swiss Alps in a ski resort.
        Not shielded
    

    Version

    1.0 - Two new campaigns ("Jura-2023", "Norway-2016") added. Added references and list of campaigns.

    0.3 - a new campaign is added to the dataset ("Remoray-2022")

    0.2 - rename of variables. Variable precision (digits) standardized

    0.1 - first upload

  7. Z

    CKW Smart Meter Data

    • data.niaid.nih.gov
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barahona Garzon, Braulio (2024). CKW Smart Meter Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13304498
    Explore at:
    Dataset updated
    Sep 22, 2024
    Dataset authored and provided by
    Barahona Garzon, Braulio
    Description

    Overview

    The CKW Group is a distribution system operator that supplies more than 200,000 end customers in Central Switzerland. Since October 2022, CKW publishes anonymised and aggregated data from smart meters that measure electricity consumption in canton Lucerne. This unique dataset is accessible in the ckw.ch/opendata platform.

    Data set A - anonimised smart meter data

    Data set B - aggregated smart meter data

    Contents of this data set

    This data set contains a small sample of the CKW data set A sorted per smart meter ID, stored as parquet files named with the id field of the corresponding smart meter anonymised data. Example: 027ceb7b8fd77a4b11b3b497e9f0b174.parquet

    The orginal CKW data is available for download at https://open.data.axpo.com/%24web/index.html#dataset-a as a (gzip-compressed) csv files, which are are split into one file per calendar month. The columns in the files csv are:

    id: the anonymized counter ID (text)

    timestamp: the UTC time at the beginning of a 15-minute time window to which the consumption refers (ISO-8601 timestamp)

    value_kwh: the consumption in kWh in the time window under consideration (float)

    In this archive, data from:

    | Dateigrösse | Export Datum | Zeitraum | Dateiname || ----------- | ------------ | -------- | --------- || 4.2GiB | 2024-04-20 | 202402 | ckw_opendata_smartmeter_dataset_a_202402.csv.gz || 4.5GiB | 2024-03-21 | 202401 | ckw_opendata_smartmeter_dataset_a_202401.csv.gz || 4.5GiB | 2024-02-20 | 202312 | ckw_opendata_smartmeter_dataset_a_202312.csv.gz || 4.4GiB | 2024-01-20 | 202311 | ckw_opendata_smartmeter_dataset_a_202311.csv.gz || 4.5GiB | 2023-12-20 | 202310 | ckw_opendata_smartmeter_dataset_a_202310.csv.gz || 4.4GiB | 2023-11-20 | 202309 | ckw_opendata_smartmeter_dataset_a_202309.csv.gz || 4.5GiB | 2023-10-20 | 202308 | ckw_opendata_smartmeter_dataset_a_202308.csv.gz || 4.6GiB | 2023-09-20 | 202307 | ckw_opendata_smartmeter_dataset_a_202307.csv.gz || 4.4GiB | 2023-08-20 | 202306 | ckw_opendata_smartmeter_dataset_a_202306.csv.gz || 4.6GiB | 2023-07-20 | 202305 | ckw_opendata_smartmeter_dataset_a_202305.csv.gz || 3.3GiB | 2023-06-20 | 202304 | ckw_opendata_smartmeter_dataset_a_202304.csv.gz || 4.6GiB | 2023-05-24 | 202303 | ckw_opendata_smartmeter_dataset_a_202303.csv.gz || 4.2GiB | 2023-04-20 | 202302 | ckw_opendata_smartmeter_dataset_a_202302.csv.gz || 4.7GiB | 2023-03-20 | 202301 | ckw_opendata_smartmeter_dataset_a_202301.csv.gz || 4.6GiB | 2023-03-15 | 202212 | ckw_opendata_smartmeter_dataset_a_202212.csv.gz || 4.3GiB | 2023-03-15 | 202211 | ckw_opendata_smartmeter_dataset_a_202211.csv.gz || 4.4GiB | 2023-03-15 | 202210 | ckw_opendata_smartmeter_dataset_a_202210.csv.gz || 4.3GiB | 2023-03-15 | 202209 | ckw_opendata_smartmeter_dataset_a_202209.csv.gz || 4.4GiB | 2023-03-15 | 202208 | ckw_opendata_smartmeter_dataset_a_202208.csv.gz || 4.4GiB | 2023-03-15 | 202207 | ckw_opendata_smartmeter_dataset_a_202207.csv.gz || 4.2GiB | 2023-03-15 | 202206 | ckw_opendata_smartmeter_dataset_a_202206.csv.gz || 4.3GiB | 2023-03-15 | 202205 | ckw_opendata_smartmeter_dataset_a_202205.csv.gz || 4.2GiB | 2023-03-15 | 202204 | ckw_opendata_smartmeter_dataset_a_202204.csv.gz || 4.1GiB | 2023-03-15 | 202203 | ckw_opendata_smartmeter_dataset_a_202203.csv.gz || 3.5GiB | 2023-03-15 | 202202 | ckw_opendata_smartmeter_dataset_a_202202.csv.gz || 3.7GiB | 2023-03-15 | 202201 | ckw_opendata_smartmeter_dataset_a_202201.csv.gz || 3.5GiB | 2023-03-15 | 202112 | ckw_opendata_smartmeter_dataset_a_202112.csv.gz || 3.1GiB | 2023-03-15 | 202111 | ckw_opendata_smartmeter_dataset_a_202111.csv.gz || 3.0GiB | 2023-03-15 | 202110 | ckw_opendata_smartmeter_dataset_a_202110.csv.gz || 2.7GiB | 2023-03-15 | 202109 | ckw_opendata_smartmeter_dataset_a_202109.csv.gz || 2.6GiB | 2023-03-15 | 202108 | ckw_opendata_smartmeter_dataset_a_202108.csv.gz || 2.4GiB | 2023-03-15 | 202107 | ckw_opendata_smartmeter_dataset_a_202107.csv.gz || 2.1GiB | 2023-03-15 | 202106 | ckw_opendata_smartmeter_dataset_a_202106.csv.gz || 2.0GiB | 2023-03-15 | 202105 | ckw_opendata_smartmeter_dataset_a_202105.csv.gz || 1.7GiB | 2023-03-15 | 202104 | ckw_opendata_smartmeter_dataset_a_202104.csv.gz || 1.6GiB | 2023-03-15 | 202103 | ckw_opendata_smartmeter_dataset_a_202103.csv.gz || 1.3GiB | 2023-03-15 | 202102 | ckw_opendata_smartmeter_dataset_a_202102.csv.gz || 1.3GiB | 2023-03-15 | 202101 | ckw_opendata_smartmeter_dataset_a_202101.csv.gz |

    was processed into partitioned parquet files, and then organised by id into parquet files with data from single smart meters.

    A small sample of all the smart meters data above, are archived in the cloud public cloud space of AISOP project https://os.zhdk.cloud.switch.ch/swift/v1/aisop_public/ckw/ts/batch_0424/batch_0424.zip and also here is this public record. For access to the complete data contact the authors of this archive.

    It consists of the following parquet files:

    | Size | Date | Name |

    |------|------|------|

    | 1.0M | Mar 4 12:18 | 027ceb7b8fd77a4b11b3b497e9f0b174.parquet |

    | 979K | Mar 4 12:18 | 03a4af696ff6a5c049736e9614f18b1b.parquet |

    | 1.0M | Mar 4 12:18 | 03654abddf9a1b26f5fbbeea362a96ed.parquet |

    | 1.0M | Mar 4 12:18 | 03acebcc4e7d39b6df5c72e01a3c35a6.parquet |

    | 1.0M | Mar 4 12:18 | 039e60e1d03c2afd071085bdbd84bb69.parquet |

    | 931K | Mar 4 12:18 | 036877a1563f01e6e830298c193071a6.parquet |

    | 1.0M | Mar 4 12:18 | 02e45872f30f5a6a33972e8c3ba9c2e5.parquet |

    | 662K | Mar 4 12:18 | 03a25f298431549a6bc0b1a58eca1f34.parquet |

    | 635K | Mar 4 12:18 | 029a46275625a3cefc1f56b985067d15.parquet |

    | 1.0M | Mar 4 12:18 | 0301309d6d1e06c60b4899061deb7abd.parquet |

    | 1.0M | Mar 4 12:18 | 0291e323d7b1eb76bf680f6e800c2594.parquet |

    | 1.0M | Mar 4 12:18 | 0298e58930c24010bbe2777c01b7644a.parquet |

    | 1.0M | Mar 4 12:18 | 0362c5f3685febf367ebea62fbc88590.parquet |

    | 1.0M | Mar 4 12:18 | 0390835d05372cb66f6cd4ca662399e8.parquet |

    | 1.0M | Mar 4 12:18 | 02f670f059e1f834dfb8ba809c13a210.parquet |

    | 987K | Mar 4 12:18 | 02af749aaf8feb59df7e78d5e5d550e0.parquet |

    | 996K | Mar 4 12:18 | 0311d3c1d08ee0af3edda4dc260421d1.parquet |

    | 1.0M | Mar 4 12:18 | 030a707019326e90b0ee3f35bde666e0.parquet |

    | 955K | Mar 4 12:18 | 033441231b277b283191e0e1194d81e2.parquet |

    | 995K | Mar 4 12:18 | 0317b0417d1ec91b5c243be854da8a86.parquet |

    | 1.0M | Mar 4 12:18 | 02ef4e49b6fb50f62a043fb79118d980.parquet |

    | 1.0M | Mar 4 12:18 | 0340ad82e9946be45b5401fc6a215bf3.parquet |

    | 974K | Mar 4 12:18 | 03764b3b9a65886c3aacdbc85d952b19.parquet |

    | 1.0M | Mar 4 12:18 | 039723cb9e421c5cbe5cff66d06cb4b6.parquet |

    | 1.0M | Mar 4 12:18 | 0282f16ed6ef0035dc2313b853ff3f68.parquet |

    | 1.0M | Mar 4 12:18 | 032495d70369c6e64ab0c4086583bee2.parquet |

    | 900K | Mar 4 12:18 | 02c56641571fc9bc37448ce707c80d3d.parquet |

    | 1.0M | Mar 4 12:18 | 027b7b950689c337d311094755697a8f.parquet |

    | 1.0M | Mar 4 12:18 | 02af272adccf45b6cdd4a7050c979f9f.parquet |

    | 927K | Mar 4 12:18 | 02fc9a3b2b0871d3b6a1e4f8fe415186.parquet |

    | 1.0M | Mar 4 12:18 | 03872674e2a78371ce4dfa5921561a8c.parquet |

    | 881K | Mar 4 12:18 | 0344a09d90dbfa77481c5140bb376992.parquet |

    | 1.0M | Mar 4 12:18 | 0351503e2b529f53bdae15c7fbd56fc0.parquet |

    | 1.0M | Mar 4 12:18 | 033fe9c3a9ca39001af68366da98257c.parquet |

    | 1.0M | Mar 4 12:18 | 02e70a1c64bd2da7eb0d62be870ae0d6.parquet |

    | 1.0M | Mar 4 12:18 | 0296385692c9de5d2320326eaa000453.parquet |

    | 962K | Mar 4 12:18 | 035254738f1cc8a31075d9fbe3ec2132.parquet |

    | 991K | Mar 4 12:18 | 02e78f0d6a8fb96050053e188bf0f07c.parquet |

    | 1.0M | Mar 4 12:18 | 039e4f37ed301110f506f551482d0337.parquet |

    | 961K | Mar 4 12:18 | 039e2581430703b39c359dc62924a4eb.parquet |

    | 999K | Mar 4 12:18 | 02c6f7e4b559a25d05b595cbb5626270.parquet |

    | 1.0M | Mar 4 12:18 | 02dd91468360700a5b9514b109afb504.parquet |

    | 938K | Mar 4 12:18 | 02e99c6bb9d3ca833adec796a232bac0.parquet |

    | 589K | Mar 4 12:18 | 03aef63e26a0bdbce4a45d7cf6f0c6f8.parquet |

    | 1.0M | Mar 4 12:18 | 02d1ca48a66a57b8625754d6a31f53c7.parquet |

    | 1.0M | Mar 4 12:18 | 03af9ebf0457e1d451b83fa123f20a12.parquet |

    | 1.0M | Mar 4 12:18 | 0289efb0e712486f00f52078d6c64a5b.parquet |

    | 1.0M | Mar 4 12:18 | 03466ed913455c281ffeeaa80abdfff6.parquet |

    | 1.0M | Mar 4 12:18 | 032d6f4b34da58dba02afdf5dab3e016.parquet |

    | 1.0M | Mar 4 12:18 | 03406854f35a4181f4b0778bb5fc010c.parquet |

    | 1.0M | Mar 4 12:18 | 0345fc286238bcea5b2b9849738c53a2.parquet |

    | 1.0M | Mar 4 12:18 | 029ff5169155b57140821a920ad67c7e.parquet |

    | 985K | Mar 4 12:18 | 02e4c9f3518f079ec4e5133acccb2635.parquet |

    | 1.0M | Mar 4 12:18 | 03917c4f2aef487dc20238777ac5fdae.parquet |

    | 969K | Mar 4 12:18 | 03aae0ab38cebcb160e389b2138f50da.parquet |

    | 914K | Mar 4 12:18 | 02bf87b07b64fb5be54f9385880b9dc1.parquet |

    | 1.0M | Mar 4 12:18 | 02776685a085c4b785a3885ef81d427a.parquet |

    | 947K | Mar 4 12:18 | 02f5a82af5a5ffac2fe7551bf4a0a1aa.parquet |

    | 992K | Mar 4 12:18 | 039670174dbc12e1ae217764c96bbeb3.parquet |

    | 1.0M | Mar 4 12:18 | 037700bf3e272245329d9385bb458bac.parquet |

    | 602K | Mar 4 12:18 | 0388916cdb86b12507548b1366554e16.parquet |

    | 939K | Mar 4 12:18 | 02ccbadea8d2d897e0d4af9fb3ed9a8e.parquet |

    | 1.0M | Mar 4 12:18 | 02dc3f4fb7aec02ba689ad437d8bc459.parquet |

    | 1.0M | Mar 4 12:18 | 02cf12e01cd20d38f51b4223e53d3355.parquet |

    | 993K | Mar 4 12:18 | 0371f79d154c00f9e3e39c27bab2b426.parquet |

    where each file contains data from a single smart meter.

    Acknowledgement

    The AISOP project (https://aisopproject.com/) received funding in the framework of the Joint Programming Platform Smart Energy Systems from European Union's Horizon 2020 research and innovation programme under grant agreement No 883973. ERA-Net Smart Energy Systems joint call on digital transformation for green energy transition.

  8. g

    Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

    • gimi9.com
    • data.usgs.gov
    Updated Feb 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://gimi9.com/dataset/data-gov_water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
    Explore at:
    Dataset updated
    Feb 22, 2025
    Area covered
    Contiguous United States
    Description

    This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the _byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr_{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.

  9. HERO WEC 2024 Hydraulic Configuration Deployment Data

    • data.openei.org
    • mhkdr.openei.org
    archive, code, data +1
    Updated Mar 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scott Jenne; Andrew Simms; Justin Panzarella; Rob Raye; Casey Nichols; Aidan Bharath; Mark Murphy; Kyle Swartz; Charles Candon; Scott Jenne; Andrew Simms; Justin Panzarella; Rob Raye; Casey Nichols; Aidan Bharath; Mark Murphy; Kyle Swartz; Charles Candon (2024). HERO WEC 2024 Hydraulic Configuration Deployment Data [Dataset]. http://doi.org/10.15473/2479748
    Explore at:
    code, archive, text_document, dataAvailable download formats
    Dataset updated
    Mar 14, 2024
    Dataset provided by
    United States Department of Energyhttp://energy.gov/
    Open Energy Data Initiative (OEDI)
    National Renewable Energy Laboratory
    Authors
    Scott Jenne; Andrew Simms; Justin Panzarella; Rob Raye; Casey Nichols; Aidan Bharath; Mark Murphy; Kyle Swartz; Charles Candon; Scott Jenne; Andrew Simms; Justin Panzarella; Rob Raye; Casey Nichols; Aidan Bharath; Mark Murphy; Kyle Swartz; Charles Candon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The following submission includes raw and processed data from the in water deployment of NREL's Hydraulic and Electric Reverse Osmosis Wave Energy Converter (HERO WEC), in the form of parquet files, TDMS files, CSV files, bag files and MATLAB workspaces. This dataset was collected in March 2024 at the Jennette's pier test site in North Carolina.

    This submission includes the following:

    • Data description document (HERO WEC FY24 Hydraulic Deployment Data Descriptions.doc) - This document includes detailed descriptions of the type of data and how it was processed and/or calculated.

    • Processed MATLAB workspace - The processed data is provided in the form of a single MATLAB workspace containing data from the full deployment. This workspace contains data from all sensors down sampled to 10 Hz along with all array Value Added Products (VAPs).

    • MATLAB visualization scripts - The MATLAB workspaces can be visualized using the file "HERO_WEC_2024_Hydraulic_Config_Data_Viewer.m/mlx". The user simply needs to download the processed MATLAB workspaces, specify the desired start and end times and run this file. Both the .m and .mlx file format has been provided depending on the user's preference.

    • Summary Data - The fully processed data was used to create a summary data set with averages and important calculations performed on 30-minute intervals to align with the intervals of wave resource data reported from nearby CDIP ocean observing buoys located 20km East of Jennette's pier and 40km Northeast of Jennette's pier. The wave resource data provided in this data set is to be used for reference only due the difference in water depth and proximity to shore between the Jennette's pier test site and the locations of the ocean observing buoys. This data is provided in the Summary Data zip folder, which includes this data set in the form of a MATLAB workspace, parquet file, and excel spreadsheet.

    • Processed Parquet File - The processed data is provided in the form of a single parquet file containing data from all HERO WEC sensors collected during the full deployment. Data in these files has been down sampled to 10 Hz and all array VAPs are included.

    • Interim Filtered Data - Raw data from each sensor group partitioned into 30-minute parquet files. These files are outputs from an intermediate stage of data processing and contain the raw data with no Quality Control (QC) or calculations performed in a format that is easier to use than the raw data.

    • Raw Data - Raw, unprocessed data from this deployment can be found in the Raw Data zip folder. This data is provided in the form of TDMS, CSV, and bag files in the original format output by the MODAQ system.

    • Python Data Processing Script - This links to an NREL public github repository containing the python script used to go from raw data to fully processed parquet files. Additional documentation on how to use this script is included in the github repository.

    This data set has been developed by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Water Power Technologies Office.

  10. g

    Surface Water - Aquatic Organism Tissue Sample Results | gimi9.com

    • gimi9.com
    Updated Feb 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Surface Water - Aquatic Organism Tissue Sample Results | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_surface-water-aquatic-organism-tissue-sample-results/
    Explore at:
    Dataset updated
    Feb 18, 2021
    Description

    This data set provides results of tissue from organisms found in surface waters, from the California Environmental Data Exchange Network (CEDEN). The data are of tissue from individual organisms and of composite samples where tissue samples from multiple organisms are combined and then analyzed. Both the individual samples and the composite sample results may be given so for individual samples, there will be a row for the individual sample and a row for the composite where the number per composite is one. The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result. Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here. Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool

  11. LIST

    • zenodo.org
    • data.niaid.nih.gov
    bin, csv
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vojtěch Kaše; Vojtěch Kaše; Petra Heřmánková; Petra Heřmánková; Adéla Sobotková; Adéla Sobotková (2024). LIST [Dataset]. http://doi.org/10.5281/zenodo.10473706
    Explore at:
    bin, csvAvailable download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vojtěch Kaše; Vojtěch Kaše; Petra Heřmánková; Petra Heřmánková; Adéla Sobotková; Adéla Sobotková
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Latin Inscriptions in Space and Time (LIST) dataset is an aggregate of the Epigraphic Database Heidelberg (https://edh.ub.uni-heidelberg.de/); aggregated EDH on Zenodo and Epigraphic Database Clauss Slaby (http://www.manfredclauss.de/); aggregated EDCS on Zenodo epigraphic datasets created by the Social Dynamics in the Ancient Mediterranean Project (SDAM), 2019-2023, funded by the Aarhus University Forskningsfond Starting grant no. AUFF-E-2018-7-2. The LIST dataset consists of 525,870 inscriptions, enriched by 65 attributes. 77,091 inscriptions are overlapping between the two source datasets (i.e. EDH and EDCS); 3,316 inscriptions are exclusively from EDH; 445,463 inscriptions are exclusively from EDCS. 511,973 inscriptions have valid geospatial coordinates (the geometry attribute). This information is also used to determine the urban context of each inscription (i.e. whether it is in the neighbourhood (i.e. within a 5000m buffer) of a large city, medium city, or small city or rural (>5000m to any type of city; see the attributes urban_context, urban_context_city, and urban_context_pop). 206,570 inscriptions have a numerical date of origin expressed by means of an interval or singular year using the attributes not_before and not_after. The dataset also employs a machine learning model to classify the inscriptions covered exclusively by EDCS in terms of 22 categories employed by EDH, see Kaše, Heřmánková, Sobotkova 2021.

    Formats

    We publish the dataset in the parquet and geojson file format. A description of individual attributes is available in the Metadata.csv. Using geopandas library, you can load the data directly from Zenodo into your Python environment using the following command: LIST = gpd.read_parquet("https://zenodo.org/record/8431323/files/LIST_v1-0.parquet?download=1"). In R, the sfarrow and sf library hold tools (st_read_parquet(), read_sf()) to load a parquet and geojson respectively after you have downloaded the datasets locally. The scripts used to generate the dataset are available via GitHub: https://github.com/sdam-au/LI_ETL

    The origin of existing attributes is further described in columns ‘dataset_source’, ‘source’, and ‘description’ in the attached Metadata.csv.

    Further reading on the dataset creation and methodology:

    • Heřmánková, Petra, Vojtěch Kaše, and Adéla Sobotkova. “Inscriptions as Data: Digital Epigraphy in Macro-Historical Perspective.” Journal of Digital History 1, no. 1 (2021): 99. https://doi.org/10.1515/jdh-2021-1004.
    • Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotkova. “Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach.” Proceedings of the 2nd Workshop on Computational Humanities Research (CHR2021) 2989 (2021): 123–35.

    Reading on applications of the datasets in research:

    • Glomb, Tomáš, Vojtěch Kaše, and Petra Heřmánková. “Popularity of the Cult of Asclepius in the Times of the Antonine Plague: Temporal Modeling of Epigraphic Evidence.” Journal of Archaeological Science: Reports 43 (2022): 103466. https://doi.org/10.1016/j.jasrep.2022.103466.
    • Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotková. “Division of Labor, Specialization and Diversity in the Ancient Roman Cities: A Quantitative Approach to Latin Epigraphy.” Edited by Peter F. Biehl. PLOS ONE 17, no. 6 (June 16, 2022): e0269869. https://doi.org/10.1371/journal.pone.0269869.

    Notes on spatial attributes

    Machine-readable spatial point geometries are provided within the geojson and parquet formats, as well as ‘Latitude’ and ‘Longitude’ columns, which contain geospatial decimal coordinates where these are known. Additional attributes exist that contain textual references to original location at different scales. The most reliable attribute with textual information on place of origin is the urban_context_city. This contains the ancient toponym of the largest city within a 5 km distance from the inscription findspot, using cities from Hanson’s 2016 list. After these universal attributes, the remaining columns are source-dependent, and exist only for either EDH or EDCS subsets. ‘pleiades_id’ column, for example, cross references the inscription findspot to geospatial location in the Pleiades but only in the EDH subset. ‘place’ attribute exists for data from EDCS (Ort) and contains ancient as well as modern place names referring to the findspot or region of provenance separated by “/”. This column requires additional cleaning before computational analysis. Attributes with _clean affix indicate that the text string has been stripped of symbols (such as ?), and most refer to aspects of provenance in the EDH subset of inscriptions.

    List of all spatial attributes:

    • ‘geometry’ spatial point coordinate pair, ready for computational use in R or Python ‘latitude’ and ‘longitude’ attributes contain geospatial coordinates
    • ‘urban_context_city’ attribute contains a name (ancient toponym) of the city determining the urban context, based on Hanson 2016.
    • ‘province’ attribute contains province names as they appear in EDCS. This attribute contains data only for inscriptions appearing in EDCS, for inscriptions appearing solely in EDH this attribute is empty.
    • ‘pleiades_id’ provides a referent for the geographic location in Pleiades (https://pleiades.stoa.org/), provided by EDH. In EDCS this attribute is empty.
    • ‘province_label_clean’ attribute contains province names as they appear in EDH. This attribute contains data only for inscriptions appearing in EDH, for inscriptions appearing solely in EDCS this attribute is empty.
    • ‘findspot_ancient_clean’, ‘findspot_modern_clean’, ‘country_clean’, ‘modern_region_clean’, and ‘present_location’ are additional EDH metadata, for their description see the attached Metadata file.

    Disclaimer

    The original data is provided by the third party indicated as the data source (see the ‘data_source’ column in the Metadata.csv). SDAM did not create the original data, vouch for its accuracy, or guarantee that it is the most recent data available from the data provider. For many or all of the data, the data is by its nature approximate and will contain some inaccuracies or missing values. The data may contain errors introduced by the data provider(s) and/or by SDAM. We always recommend checking the accuracy directly in the primary source, i.e. the editio princeps of the inscription in question.

  12. Surface Water - Chemistry Results

    • catalog.data.gov
    • gimi9.com
    • +1more
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California State Water Resources Control Board (2024). Surface Water - Chemistry Results [Dataset]. https://catalog.data.gov/dataset/surface-water-chemistry-results
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California State Water Resources Control Board
    Description

    This data provides results from chemistry and field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result. Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here. Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool

  13. CTU-13

    • kaggle.com
    • stratosphereips.org
    Updated Aug 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    StrGenIx | Laurens D'hooge (2022). CTU-13 [Dataset]. http://doi.org/10.34740/kaggle/dsv/4062076
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 12, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    StrGenIx | Laurens D'hooge
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is an academic dataset aimed at botnet detection from network traffic. It was published by the Stratosphere lab at CTU-Prague. All the credit goes to the original authors: Dr. Sebastian Garcia, Dr. Martin Grill, Dr. Jan Stiborek and Dr. Alejandro Zunino. Please cite their original paper.

    V1: Base dataset in CSV format as downloaded from here V2: Cleaning -> parquet files V3: Reorganize to save storage, only keep original CSVs in V1/V2

    In the parquet files all data types are already set correctly, there are 0 records with missing information and 0 duplicate records.

  14. g

    Surface Water - Toxicity Results | gimi9.com

    • gimi9.com
    Updated Aug 10, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). Surface Water - Toxicity Results | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_surface-water-toxicity-results/
    Explore at:
    Dataset updated
    Aug 10, 2019
    Description

    Surface water toxicity data from the California Environmental Data Exchange Network (CEDEN). CEDEN is the California State Water Board's data system for surface water quality in California, and seeks to include all available statewide data (such as that produced by research and volunteer organizations). Data in CEDEN include field, sediment and water column data collected from freshwater, estuarine, and marine environments. Examples of data in CEDEN come from laboratory, physical and biological analyses and include data types associated with chemical, toxicological, field, bio-assessment, invertebrate, fish, and bacteriological assay assessments. The data resource "Surface Water Toxicity" contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result. Zip files are provided for bulk data downloads (in csv or parquet file format), and developers can use the API associated with the "Surface Water Toxicity" (csv) resource to access the data. Example R code using the API to access the data can be found here. Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
New York Taxi and Limousine Commission (2017). New York Taxi Data 2009-2016 in Parquet Fomat [Dataset]. https://academictorrents.com/details/4f465810b86c6b793d1c7556fe3936441081992e
Organization logo

New York Taxi Data 2009-2016 in Parquet Fomat

Explore at:
bittorrent(35078948106)Available download formats
Dataset updated
Jul 1, 2017
Dataset provided by
New York City Taxi and Limousine Commissionhttp://www.nyc.gov/tlc
Authors
New York Taxi and Limousine Commission
License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Area covered
New York
Description

Trip record data from the Taxi and Limousine Commission () from January 2009-December 2016 was consolidated and brought into a consistent Parquet format by Ravi Shekhar

Search
Clear search
Close search
Google apps
Main menu