74 datasets found

d
Surface Water - Habitat Results
datasets.ai
catalog.data.gov
33, 57, 8
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State of California (2021). Surface Water - Habitat Results [Dataset]. https://datasets.ai/datasets/surface-water-habitat-results
Explore at:
57, 8, 33Available download formats
Dataset updated
Jul 23, 2021
Dataset authored and provided by
State of California
Description
This data provides results from field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here.

Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool
Surface Water - Habitat Results
data.cnra.ca.gov
data.ca.gov
csv, pdf, zip
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2025). Surface Water - Habitat Results [Dataset]. https://data.cnra.ca.gov/dataset/surface-water-habitat-results
Explore at:
pdf, zip, csvAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
California State Water Resources Control Board
Description
This data provides results from field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data.

Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.
g
PARQUET - Basic climatological data - monthly - daily - hourly - 6 minutes...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PARQUET - Basic climatological data - monthly - daily - hourly - 6 minutes (parquet format) [Dataset]. https://gimi9.com/dataset/eu_66159f1bf0686eb4806508e1
Explore at:
Description
Format .parquet This dataset gathers data in .parquet format. Instead of having a .csv.gz per department per period, all departments are grouped into a single file per period. When possible (depending on the size), several periods are grouped in the same file. ### Data origin The data come from: - Basic climatological data - monthly - Basic climatological data - daily - Basic climatological data - times - Basic climatological data - 6 minutes ### Data preparation The files ending with .prepared have undergone slight preparation steps: - deleting spaces in the name of columns - typing (flexible) The data are typed according to: - date (YYYYMM, YYYMMDD, YYYYMMDDDDH, YYYYMMDDDDHMN): integer - NUM_POST' : string -USUAL_NAME: string - "LAT": float -LON: float -ALTI: integer - if the column begins withQ(‘quality’) orNB` (‘number’): integer ### Update The data are updated at least once a week (depending on my availability) on the data for the period ‘latest-2023-2024’. If you have specific needs, feel free to get closer to me. ### Re-use: Meteo Squad These files are used in the Meteo Squad web application: https://www.meteosquad.com ### Contact If you have specific requests, please do not hesitate to contact me: contact@mistermeteo.com
Surface Water - Benthic Macroinvertebrate Results
data.cnra.ca.gov
data.ca.gov
csv, pdf, zip
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2025). Surface Water - Benthic Macroinvertebrate Results [Dataset]. https://data.cnra.ca.gov/dataset/surface-water-benthic-macroinvertebrate-results
Explore at:
pdf, zip, csvAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
California State Water Resources Control Board
Description
Data collected for marine benthic infauna, freshwater benthic macroinvertebrate (BMI), algae, bacteria and diatom taxonomic analyses, from the California Environmental Data Exchange Network (CEDEN). Note bacteria single species concentrations are stored within the chemistry template, whereas abundance bacteria are stored within this set. Each record represents a result from a specific event location for a single organism in a single sample.

The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

Zip files are provided for bulk data downloads (in csv or parquet file format), and developers can use the API associated with the "CEDEN Benthic Data" (csv) resource to access the data.

Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.
UMP_CSV_Parquet
kaggle.com
Updated Jan 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VK (2022). UMP_CSV_Parquet [Dataset]. https://www.kaggle.com/datasets/venkatkumar001/ump-csv-parquet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 24, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
VK
Description
Dataset

This dataset was created by VK

Contents
Parquet Files: AMEX-Default Prediction
kaggle.com
Updated May 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ruchi Bhatia (2022). Parquet Files: AMEX-Default Prediction [Dataset]. https://www.kaggle.com/ruchi798/parquet-files-amexdefault-prediction/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 26, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ruchi Bhatia
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

Converted ~50 GB of csv data -> - ~5.4GB of feather file data and - ~20 GB of parquet file data for the American Express: Default Prediction competition.
d
Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global /...
datarade.ai
.json, .csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Coresignal, Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global / 35M+ Records / Updated Weekly [Dataset]. https://datarade.ai/data-products/coresignal-clean-data-company-data-ai-enriched-datasets-coresignal
Explore at:
.json, .csvAvailable download formats
Dataset authored and provided by
Coresignal
Area covered
Guinea-Bissau, Guatemala, Hungary, Niue, Panama, Chile, Guadeloupe, Namibia, Andorra, Saint Barthélemy
Description
This clean dataset is a refined version of our company datasets, consisting of 35M+ data records.

It’s an excellent data solution for companies with limited data engineering capabilities and those who want to reduce their time to value. You get filtered, cleaned, unified, and standardized B2B data. After cleaning, this data is also enriched by leveraging a carefully instructed large language model (LLM).

AI-powered data enrichment offers more accurate information in key data fields, such as company descriptions. It also produces over 20 additional data points that are very valuable to B2B businesses. Enhancing and highlighting the most important information in web data contributes to quicker time to value, making data processing much faster and easier.

For your convenience, you can choose from multiple data formats (Parquet, JSON, JSONL, or CSV) and select suitable delivery frequency (quarterly, monthly, or weekly).

Coresignal is a leading public business data provider in the web data sphere with an extensive focus on firmographic data and public employee profiles. More than 3B data records in different categories enable companies to build data-driven products and generate actionable insights. Coresignal is exceptional in terms of data freshness, with 890M+ records updated monthly for unprecedented accuracy and relevance.
Surface Water - Chemistry Results
data.cnra.ca.gov
csv, pdf, zip
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2025). Surface Water - Chemistry Results [Dataset]. https://data.cnra.ca.gov/dataset/surface-water-chemistry-results
Explore at:
pdf, csv, zipAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
California State Water Resources Control Board
Description
This data provides results from the California Environmental Data Exchange Network (CEDEN) for field and lab chemistry analyses. The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data.

Users who want to manually download more specific subsets of the data can also use the CEDEN Query Tool, which provides access to the same data presented here, but allows for interactive data filtering.

NOTE: Some of the field and lab chemistry data that has been submitted to CEDEN since 2020 has not been loaded into the CEDEN database. That data is not included in this data set (and is also not available via the CEDEN query tool described above), but is available as a supplemental data set available here: Surface Water - Chemistry Results - CEDEN Augmentation. For consistency, many of the conditions applied to the data in this dataset and in the CEDEN query tool are also applied to that supplemental dataset (e.g., no rejected data or replicates are included), but that supplemental data is provisional and may not reflect all of the QA/QC controls applied to the regular CEDEN data available here.
Z
CKW Smart Meter Data
data.niaid.nih.gov
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barahona Garzon, Braulio (2024). CKW Smart Meter Data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13304498
Explore at:
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Barahona Garzon, Braulio
Description
Overview

The CKW Group is a distribution system operator that supplies more than 200,000 end customers in Central Switzerland. Since October 2022, CKW publishes anonymised and aggregated data from smart meters that measure electricity consumption in canton Lucerne. This unique dataset is accessible in the ckw.ch/opendata platform.

Data set A - anonimised smart meter data

Data set B - aggregated smart meter data

Contents of this data set

This data set contains a small sample of the CKW data set A sorted per smart meter ID, stored as parquet files named with the id field of the corresponding smart meter anonymised data. Example: 027ceb7b8fd77a4b11b3b497e9f0b174.parquet

The orginal CKW data is available for download at https://open.data.axpo.com/%24web/index.html#dataset-a as a (gzip-compressed) csv files, which are are split into one file per calendar month. The columns in the files csv are:

id: the anonymized counter ID (text)

timestamp: the UTC time at the beginning of a 15-minute time window to which the consumption refers (ISO-8601 timestamp)

value_kwh: the consumption in kWh in the time window under consideration (float)

In this archive, data from:

| Dateigrösse | Export Datum | Zeitraum | Dateiname || ----------- | ------------ | -------- | --------- || 4.2GiB | 2024-04-20 | 202402 | ckw_opendata_smartmeter_dataset_a_202402.csv.gz || 4.5GiB | 2024-03-21 | 202401 | ckw_opendata_smartmeter_dataset_a_202401.csv.gz || 4.5GiB | 2024-02-20 | 202312 | ckw_opendata_smartmeter_dataset_a_202312.csv.gz || 4.4GiB | 2024-01-20 | 202311 | ckw_opendata_smartmeter_dataset_a_202311.csv.gz || 4.5GiB | 2023-12-20 | 202310 | ckw_opendata_smartmeter_dataset_a_202310.csv.gz || 4.4GiB | 2023-11-20 | 202309 | ckw_opendata_smartmeter_dataset_a_202309.csv.gz || 4.5GiB | 2023-10-20 | 202308 | ckw_opendata_smartmeter_dataset_a_202308.csv.gz || 4.6GiB | 2023-09-20 | 202307 | ckw_opendata_smartmeter_dataset_a_202307.csv.gz || 4.4GiB | 2023-08-20 | 202306 | ckw_opendata_smartmeter_dataset_a_202306.csv.gz || 4.6GiB | 2023-07-20 | 202305 | ckw_opendata_smartmeter_dataset_a_202305.csv.gz || 3.3GiB | 2023-06-20 | 202304 | ckw_opendata_smartmeter_dataset_a_202304.csv.gz || 4.6GiB | 2023-05-24 | 202303 | ckw_opendata_smartmeter_dataset_a_202303.csv.gz || 4.2GiB | 2023-04-20 | 202302 | ckw_opendata_smartmeter_dataset_a_202302.csv.gz || 4.7GiB | 2023-03-20 | 202301 | ckw_opendata_smartmeter_dataset_a_202301.csv.gz || 4.6GiB | 2023-03-15 | 202212 | ckw_opendata_smartmeter_dataset_a_202212.csv.gz || 4.3GiB | 2023-03-15 | 202211 | ckw_opendata_smartmeter_dataset_a_202211.csv.gz || 4.4GiB | 2023-03-15 | 202210 | ckw_opendata_smartmeter_dataset_a_202210.csv.gz || 4.3GiB | 2023-03-15 | 202209 | ckw_opendata_smartmeter_dataset_a_202209.csv.gz || 4.4GiB | 2023-03-15 | 202208 | ckw_opendata_smartmeter_dataset_a_202208.csv.gz || 4.4GiB | 2023-03-15 | 202207 | ckw_opendata_smartmeter_dataset_a_202207.csv.gz || 4.2GiB | 2023-03-15 | 202206 | ckw_opendata_smartmeter_dataset_a_202206.csv.gz || 4.3GiB | 2023-03-15 | 202205 | ckw_opendata_smartmeter_dataset_a_202205.csv.gz || 4.2GiB | 2023-03-15 | 202204 | ckw_opendata_smartmeter_dataset_a_202204.csv.gz || 4.1GiB | 2023-03-15 | 202203 | ckw_opendata_smartmeter_dataset_a_202203.csv.gz || 3.5GiB | 2023-03-15 | 202202 | ckw_opendata_smartmeter_dataset_a_202202.csv.gz || 3.7GiB | 2023-03-15 | 202201 | ckw_opendata_smartmeter_dataset_a_202201.csv.gz || 3.5GiB | 2023-03-15 | 202112 | ckw_opendata_smartmeter_dataset_a_202112.csv.gz || 3.1GiB | 2023-03-15 | 202111 | ckw_opendata_smartmeter_dataset_a_202111.csv.gz || 3.0GiB | 2023-03-15 | 202110 | ckw_opendata_smartmeter_dataset_a_202110.csv.gz || 2.7GiB | 2023-03-15 | 202109 | ckw_opendata_smartmeter_dataset_a_202109.csv.gz || 2.6GiB | 2023-03-15 | 202108 | ckw_opendata_smartmeter_dataset_a_202108.csv.gz || 2.4GiB | 2023-03-15 | 202107 | ckw_opendata_smartmeter_dataset_a_202107.csv.gz || 2.1GiB | 2023-03-15 | 202106 | ckw_opendata_smartmeter_dataset_a_202106.csv.gz || 2.0GiB | 2023-03-15 | 202105 | ckw_opendata_smartmeter_dataset_a_202105.csv.gz || 1.7GiB | 2023-03-15 | 202104 | ckw_opendata_smartmeter_dataset_a_202104.csv.gz || 1.6GiB | 2023-03-15 | 202103 | ckw_opendata_smartmeter_dataset_a_202103.csv.gz || 1.3GiB | 2023-03-15 | 202102 | ckw_opendata_smartmeter_dataset_a_202102.csv.gz || 1.3GiB | 2023-03-15 | 202101 | ckw_opendata_smartmeter_dataset_a_202101.csv.gz |

was processed into partitioned parquet files, and then organised by id into parquet files with data from single smart meters.

A small sample of all the smart meters data above, are archived in the cloud public cloud space of AISOP project https://os.zhdk.cloud.switch.ch/swift/v1/aisop_public/ckw/ts/batch_0424/batch_0424.zip and also here is this public record. For access to the complete data contact the authors of this archive.

It consists of the following parquet files:

| Size | Date | Name |

|------|------|------|

| 1.0M | Mar 4 12:18 | 027ceb7b8fd77a4b11b3b497e9f0b174.parquet |

| 979K | Mar 4 12:18 | 03a4af696ff6a5c049736e9614f18b1b.parquet |

| 1.0M | Mar 4 12:18 | 03654abddf9a1b26f5fbbeea362a96ed.parquet |

| 1.0M | Mar 4 12:18 | 03acebcc4e7d39b6df5c72e01a3c35a6.parquet |

| 1.0M | Mar 4 12:18 | 039e60e1d03c2afd071085bdbd84bb69.parquet |

| 931K | Mar 4 12:18 | 036877a1563f01e6e830298c193071a6.parquet |

| 1.0M | Mar 4 12:18 | 02e45872f30f5a6a33972e8c3ba9c2e5.parquet |

| 662K | Mar 4 12:18 | 03a25f298431549a6bc0b1a58eca1f34.parquet |

| 635K | Mar 4 12:18 | 029a46275625a3cefc1f56b985067d15.parquet |

| 1.0M | Mar 4 12:18 | 0301309d6d1e06c60b4899061deb7abd.parquet |

| 1.0M | Mar 4 12:18 | 0291e323d7b1eb76bf680f6e800c2594.parquet |

| 1.0M | Mar 4 12:18 | 0298e58930c24010bbe2777c01b7644a.parquet |

| 1.0M | Mar 4 12:18 | 0362c5f3685febf367ebea62fbc88590.parquet |

| 1.0M | Mar 4 12:18 | 0390835d05372cb66f6cd4ca662399e8.parquet |

| 1.0M | Mar 4 12:18 | 02f670f059e1f834dfb8ba809c13a210.parquet |

| 987K | Mar 4 12:18 | 02af749aaf8feb59df7e78d5e5d550e0.parquet |

| 996K | Mar 4 12:18 | 0311d3c1d08ee0af3edda4dc260421d1.parquet |

| 1.0M | Mar 4 12:18 | 030a707019326e90b0ee3f35bde666e0.parquet |

| 955K | Mar 4 12:18 | 033441231b277b283191e0e1194d81e2.parquet |

| 995K | Mar 4 12:18 | 0317b0417d1ec91b5c243be854da8a86.parquet |

| 1.0M | Mar 4 12:18 | 02ef4e49b6fb50f62a043fb79118d980.parquet |

| 1.0M | Mar 4 12:18 | 0340ad82e9946be45b5401fc6a215bf3.parquet |

| 974K | Mar 4 12:18 | 03764b3b9a65886c3aacdbc85d952b19.parquet |

| 1.0M | Mar 4 12:18 | 039723cb9e421c5cbe5cff66d06cb4b6.parquet |

| 1.0M | Mar 4 12:18 | 0282f16ed6ef0035dc2313b853ff3f68.parquet |

| 1.0M | Mar 4 12:18 | 032495d70369c6e64ab0c4086583bee2.parquet |

| 900K | Mar 4 12:18 | 02c56641571fc9bc37448ce707c80d3d.parquet |

| 1.0M | Mar 4 12:18 | 027b7b950689c337d311094755697a8f.parquet |

| 1.0M | Mar 4 12:18 | 02af272adccf45b6cdd4a7050c979f9f.parquet |

| 927K | Mar 4 12:18 | 02fc9a3b2b0871d3b6a1e4f8fe415186.parquet |

| 1.0M | Mar 4 12:18 | 03872674e2a78371ce4dfa5921561a8c.parquet |

| 881K | Mar 4 12:18 | 0344a09d90dbfa77481c5140bb376992.parquet |

| 1.0M | Mar 4 12:18 | 0351503e2b529f53bdae15c7fbd56fc0.parquet |

| 1.0M | Mar 4 12:18 | 033fe9c3a9ca39001af68366da98257c.parquet |

| 1.0M | Mar 4 12:18 | 02e70a1c64bd2da7eb0d62be870ae0d6.parquet |

| 1.0M | Mar 4 12:18 | 0296385692c9de5d2320326eaa000453.parquet |

| 962K | Mar 4 12:18 | 035254738f1cc8a31075d9fbe3ec2132.parquet |

| 991K | Mar 4 12:18 | 02e78f0d6a8fb96050053e188bf0f07c.parquet |

| 1.0M | Mar 4 12:18 | 039e4f37ed301110f506f551482d0337.parquet |

| 961K | Mar 4 12:18 | 039e2581430703b39c359dc62924a4eb.parquet |

| 999K | Mar 4 12:18 | 02c6f7e4b559a25d05b595cbb5626270.parquet |

| 1.0M | Mar 4 12:18 | 02dd91468360700a5b9514b109afb504.parquet |

| 938K | Mar 4 12:18 | 02e99c6bb9d3ca833adec796a232bac0.parquet |

| 589K | Mar 4 12:18 | 03aef63e26a0bdbce4a45d7cf6f0c6f8.parquet |

| 1.0M | Mar 4 12:18 | 02d1ca48a66a57b8625754d6a31f53c7.parquet |

| 1.0M | Mar 4 12:18 | 03af9ebf0457e1d451b83fa123f20a12.parquet |

| 1.0M | Mar 4 12:18 | 0289efb0e712486f00f52078d6c64a5b.parquet |

| 1.0M | Mar 4 12:18 | 03466ed913455c281ffeeaa80abdfff6.parquet |

| 1.0M | Mar 4 12:18 | 032d6f4b34da58dba02afdf5dab3e016.parquet |

| 1.0M | Mar 4 12:18 | 03406854f35a4181f4b0778bb5fc010c.parquet |

| 1.0M | Mar 4 12:18 | 0345fc286238bcea5b2b9849738c53a2.parquet |

| 1.0M | Mar 4 12:18 | 029ff5169155b57140821a920ad67c7e.parquet |

| 985K | Mar 4 12:18 | 02e4c9f3518f079ec4e5133acccb2635.parquet |

| 1.0M | Mar 4 12:18 | 03917c4f2aef487dc20238777ac5fdae.parquet |

| 969K | Mar 4 12:18 | 03aae0ab38cebcb160e389b2138f50da.parquet |

| 914K | Mar 4 12:18 | 02bf87b07b64fb5be54f9385880b9dc1.parquet |

| 1.0M | Mar 4 12:18 | 02776685a085c4b785a3885ef81d427a.parquet |

| 947K | Mar 4 12:18 | 02f5a82af5a5ffac2fe7551bf4a0a1aa.parquet |

| 992K | Mar 4 12:18 | 039670174dbc12e1ae217764c96bbeb3.parquet |

| 1.0M | Mar 4 12:18 | 037700bf3e272245329d9385bb458bac.parquet |

| 602K | Mar 4 12:18 | 0388916cdb86b12507548b1366554e16.parquet |

| 939K | Mar 4 12:18 | 02ccbadea8d2d897e0d4af9fb3ed9a8e.parquet |

| 1.0M | Mar 4 12:18 | 02dc3f4fb7aec02ba689ad437d8bc459.parquet |

| 1.0M | Mar 4 12:18 | 02cf12e01cd20d38f51b4223e53d3355.parquet |

| 993K | Mar 4 12:18 | 0371f79d154c00f9e3e39c27bab2b426.parquet |

where each file contains data from a single smart meter.

Acknowledgement

The AISOP project (https://aisopproject.com/) received funding in the framework of the Joint Programming Platform Smart Energy Systems from European Union's Horizon 2020 research and innovation programme under grant agreement No 883973. ERA-Net Smart Energy Systems joint call on digital transformation for green energy transition.
Amazon Prime Dataset
brightdata.com
.json, .csv, .xlsx
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Amazon Prime Dataset [Dataset]. https://brightdata.com/products/datasets/amazon/prime
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Dec 5, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock powerful insights with the Amazon Prime dataset, offering access to millions of records from any Amazon domain. This dataset provides comprehensive data points such as product titles, descriptions, exclusive Prime discounts, brand details, pricing (initial and discounted), availability, customer ratings, reviews, and product categories. Additionally, it includes unique identifiers like ASINs, images, and seller information, allowing you to analyze Prime offerings, trends, and customer preferences with precision. Use this dataset to optimize your eCommerce strategies by analyzing Prime-exclusive pricing strategies, identifying top-performing brands and products, and tracking customer sentiment through reviews and ratings. Gain valuable insights into consumer demand, seasonal trends, and the impact of Prime discounts to make data-driven decisions that enhance your inventory management, marketing campaigns, and pricing strategies. Whether you’re a retailer, marketer, data analyst, or researcher, the Amazon Prime dataset empowers you with the data needed to stay competitive in the dynamic eCommerce landscape. Available in various formats such as JSON, CSV, and Parquet, and delivered via flexible options like API, S3, or email, this dataset ensures seamless integration into your workflows.
o
GitTables 1M
explore.openaire.eu
Updated May 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Madelon Hulsebos; Çağatay Demiralp; Paul Groth (2022). GitTables 1M [Dataset]. http://doi.org/10.5281/zenodo.6517052
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6517052
Dataset updated
May 3, 2022
Authors
Madelon Hulsebos; Çağatay Demiralp; Paul Groth
Description
Summary GitTables 1M (https://gittables.github.io) is a corpus of currently 1M relational tables extracted from CSV files in GitHub repositories, that are associated with a license that allows distribution. We aim to grow this to at least 10M tables. Each parquet file in this corpus represents a table with the original content (e.g. values and header) as extracted from the corresponding CSV file. Table columns are enriched with annotations corresponding to >2K semantic types from Schema.org and DBpedia (provided as metadata of the parquet file). These column annotations consist of, for example, semantic types, hierarchical relations to other types, and descriptions. We believe GitTables can facilitate many use-cases, among which: Data integration, search and validation. Data visualization and analysis recommendation. Schema analysis and completion for e.g. database or knowledge base design. If you have questions, the paper, documentation, and contact details are provided on the website: https://gittables.github.io. We recommend using Zenodo's API to easily download the full dataset (i.e. all zipped topic subsets). Dataset contents The data is provided in subsets of tables stored in parquet files, each subset corresponds to a term that was used to query GitHub with. The column annotations and other metadata (e.g. URL and repository license) are attached to the metadata of the parquet file. This version corresponds to this version of the paper https://arxiv.org/abs/2106.07258v4. In summary, this dataset can be characterized as follows: Statistic Value # tables 1M average # columns 12 average # rows 142 # annotated tables (at least 1 column annotation) 723K+ (DBpedia), 738K+ (Schema.org) # unique semantic types 835 (DBpedia), 677 (Schema.org) How to download The dataset can be downloaded through Zenodo's interface directly, or using Zenodo's API (recommended for full download). Future releases Future releases will include the following: Increased number of tables (expected at least 10M) Associated datasets - GitTables benchmark - column type detection: https://zenodo.org/record/5706316 - GitTables 1M - CSV files: https://zenodo.org/record/6515973
Surface Water - Chemistry Results
catalog.data.gov
gimi9.com
+1more
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California State Water Resources Control Board (2024). Surface Water - Chemistry Results [Dataset]. https://catalog.data.gov/dataset/surface-water-chemistry-results
Explore at:
Dataset updated
Nov 27, 2024
Dataset provided by
California State Water Resources Control Board
Description
This data provides results from chemistry and field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result. Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here. Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool
h
feature-factory-datasets
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassan Abedi, feature-factory-datasets [Dataset]. https://huggingface.co/datasets/habedi/feature-factory-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Hassan Abedi
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Tabular Datasets

The datasets are used in this project: Feature Factory

Index Dataset Name File Name Data Type

Records (Approx.)

Format Source

1 Wine Quality (Red Wine) winequality-red.csv Tabular 1,599 CSV Link

2 NYC Yellow Taxi Trip (Jan 2019) yellow_tripdata_2019.parquet Taxi Trip Data ~7M Parquet Link

3 NYC Green Taxi Trip (Jan 2019)green_tripdata_2019.parquet Taxi Trip Data ~1M Parquet Link

4 California Housing Prices california_housing.csv Real Estate Prices… See the full description on the dataset page: https://huggingface.co/datasets/habedi/feature-factory-datasets.
d
Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...
catalog.data.gov
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2024). BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting [Dataset]. https://catalog.data.gov/dataset/buildingsbench-a-large-scale-dataset-of-900k-buildings-and-benchmark-for-short-term-load-f
Explore at:
Dataset updated
Jan 11, 2024
Dataset provided by
National Renewable Energy Laboratory
Description
The BuildingsBench datasets consist of: Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: ElectricityLoadDiagrams20112014 Building Data Genome Project-2 Individual household electric power consumption (Sceaux) Borealis SMART IDEAL Low Carbon London A README file providing details about how the data is stored and describing the organization of the datasets can be found within each data lake version under BuildingsBench.
g
Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...
gimi9.com
data.usgs.gov
Updated Feb 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://gimi9.com/dataset/data-gov_water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
Explore at:
Dataset updated
Feb 22, 2025
Area covered
Contiguous United States
Description
This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the _byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr_{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.
d
Data from: HERO WEC 2024 - Electrical Configuration Deployment Data
catalog.data.gov
data.openei.org
+1more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Renewable Energy Laboratory (2025). HERO WEC 2024 - Electrical Configuration Deployment Data [Dataset]. https://catalog.data.gov/dataset/hero-wec-2024-electrical-configuration-deployment-data-98373
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
National Renewable Energy Laboratory
Description
The following submission includes raw and processed electrical configuration deployment data from the in water deployment of NREL's Hydraulic and Electric Reverse Osmosis Wave Energy Converter (HERO WEC), in the form of parquet files, TDMS files, CSV files, bag files, and MATLAB workspaces. This dataset was collected in April 2024 at the Jennette's pier test site in North Carolina. Raw data as TDMS, CSV, and bag files are provided here alongside processed data in the form of MATLAB workspaces and Parquet files. This dataset includes the Python code used to process the data and MATLAB scripts to visualize the processed data. All data types, calculations, and processing is described in the included "Data Descriptions" document. All files in this dataset are described in detail in the included README. This data set has been developed by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Water Power Technologies Office.
Replication package for "An Exploratory Study on Default Class Openness in...
zenodo.org
zip
Updated Nov 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous; Anonymous (2024). Replication package for "An Exploratory Study on Default Class Openness in Java and Kotlin" [Dataset]. http://doi.org/10.5281/zenodo.7604359
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7604359
Dataset updated
Nov 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anonymous; Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset includes scripts, text files, and cached CSV/Parquet or raw TXT data files used to generate all analysis and results from the paper. A README.md file is included in replication-pkg.zip for details on using the scripts.

If you only want to inspect the figures, you do not need a data ZIP.

If you want to simply re-generate the figures without changes, download data-cached.zip. If you want to make any sort of change to the analyses, you will want to download data-raw.zip.
LIST
zenodo.org
data.niaid.nih.gov
bin, csv
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vojtěch Kaše; Vojtěch Kaše; Petra Heřmánková; Petra Heřmánková; Adéla Sobotková; Adéla Sobotková (2024). LIST [Dataset]. http://doi.org/10.5281/zenodo.10473706
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10473706
Dataset updated
Jan 9, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vojtěch Kaše; Vojtěch Kaše; Petra Heřmánková; Petra Heřmánková; Adéla Sobotková; Adéla Sobotková
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Latin Inscriptions in Space and Time (LIST) dataset is an aggregate of the Epigraphic Database Heidelberg (https://edh.ub.uni-heidelberg.de/); aggregated EDH on Zenodo and Epigraphic Database Clauss Slaby (http://www.manfredclauss.de/); aggregated EDCS on Zenodo epigraphic datasets created by the Social Dynamics in the Ancient Mediterranean Project (SDAM), 2019-2023, funded by the Aarhus University Forskningsfond Starting grant no. AUFF-E-2018-7-2. The LIST dataset consists of 525,870 inscriptions, enriched by 65 attributes. 77,091 inscriptions are overlapping between the two source datasets (i.e. EDH and EDCS); 3,316 inscriptions are exclusively from EDH; 445,463 inscriptions are exclusively from EDCS. 511,973 inscriptions have valid geospatial coordinates (the geometry attribute). This information is also used to determine the urban context of each inscription (i.e. whether it is in the neighbourhood (i.e. within a 5000m buffer) of a large city, medium city, or small city or rural (>5000m to any type of city; see the attributes urban_context, urban_context_city, and urban_context_pop). 206,570 inscriptions have a numerical date of origin expressed by means of an interval or singular year using the attributes not_before and not_after. The dataset also employs a machine learning model to classify the inscriptions covered exclusively by EDCS in terms of 22 categories employed by EDH, see Kaše, Heřmánková, Sobotkova 2021.

Formats

We publish the dataset in the parquet and geojson file format. A description of individual attributes is available in the Metadata.csv. Using geopandas library, you can load the data directly from Zenodo into your Python environment using the following command: LIST = gpd.read_parquet("https://zenodo.org/record/8431323/files/LIST_v1-0.parquet?download=1"). In R, the sfarrow and sf library hold tools (st_read_parquet(), read_sf()) to load a parquet and geojson respectively after you have downloaded the datasets locally. The scripts used to generate the dataset are available via GitHub: https://github.com/sdam-au/LI_ETL

The origin of existing attributes is further described in columns ‘dataset_source’, ‘source’, and ‘description’ in the attached Metadata.csv.

Further reading on the dataset creation and methodology:

Heřmánková, Petra, Vojtěch Kaše, and Adéla Sobotkova. “Inscriptions as Data: Digital Epigraphy in Macro-Historical Perspective.” Journal of Digital History 1, no. 1 (2021): 99. https://doi.org/10.1515/jdh-2021-1004.

Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotkova. “Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach.” Proceedings of the 2nd Workshop on Computational Humanities Research (CHR2021) 2989 (2021): 123–35.

Reading on applications of the datasets in research:

Glomb, Tomáš, Vojtěch Kaše, and Petra Heřmánková. “Popularity of the Cult of Asclepius in the Times of the Antonine Plague: Temporal Modeling of Epigraphic Evidence.” Journal of Archaeological Science: Reports 43 (2022): 103466. https://doi.org/10.1016/j.jasrep.2022.103466.

Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotková. “Division of Labor, Specialization and Diversity in the Ancient Roman Cities: A Quantitative Approach to Latin Epigraphy.” Edited by Peter F. Biehl. PLOS ONE 17, no. 6 (June 16, 2022): e0269869. https://doi.org/10.1371/journal.pone.0269869.

Notes on spatial attributes

Machine-readable spatial point geometries are provided within the geojson and parquet formats, as well as ‘Latitude’ and ‘Longitude’ columns, which contain geospatial decimal coordinates where these are known. Additional attributes exist that contain textual references to original location at different scales. The most reliable attribute with textual information on place of origin is the urban_context_city. This contains the ancient toponym of the largest city within a 5 km distance from the inscription findspot, using cities from Hanson’s 2016 list. After these universal attributes, the remaining columns are source-dependent, and exist only for either EDH or EDCS subsets. ‘pleiades_id’ column, for example, cross references the inscription findspot to geospatial location in the Pleiades but only in the EDH subset. ‘place’ attribute exists for data from EDCS (Ort) and contains ancient as well as modern place names referring to the findspot or region of provenance separated by “/”. This column requires additional cleaning before computational analysis. Attributes with _clean affix indicate that the text string has been stripped of symbols (such as ?), and most refer to aspects of provenance in the EDH subset of inscriptions.

List of all spatial attributes:

‘geometry’ spatial point coordinate pair, ready for computational use in R or Python ‘latitude’ and ‘longitude’ attributes contain geospatial coordinates

‘urban_context_city’ attribute contains a name (ancient toponym) of the city determining the urban context, based on Hanson 2016.

‘province’ attribute contains province names as they appear in EDCS. This attribute contains data only for inscriptions appearing in EDCS, for inscriptions appearing solely in EDH this attribute is empty.

‘pleiades_id’ provides a referent for the geographic location in Pleiades (https://pleiades.stoa.org/), provided by EDH. In EDCS this attribute is empty.

‘province_label_clean’ attribute contains province names as they appear in EDH. This attribute contains data only for inscriptions appearing in EDH, for inscriptions appearing solely in EDCS this attribute is empty.

‘findspot_ancient_clean’, ‘findspot_modern_clean’, ‘country_clean’, ‘modern_region_clean’, and ‘present_location’ are additional EDH metadata, for their description see the attached Metadata file.

Disclaimer

The original data is provided by the third party indicated as the data source (see the ‘data_source’ column in the Metadata.csv). SDAM did not create the original data, vouch for its accuracy, or guarantee that it is the most recent data available from the data provider. For many or all of the data, the data is by its nature approximate and will contain some inaccuracies or missing values. The data may contain errors introduced by the data provider(s) and/or by SDAM. We always recommend checking the accuracy directly in the primary source, i.e. the editio princeps of the inscription in question.
h
flourishing
huggingface.co
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefano Iacus (2025). flourishing [Dataset]. http://doi.org/10.57967/hf/5755
Explore at:
Unique identifier
https://doi.org/10.57967/hf/5755
Dataset updated
Apr 3, 2025
Authors
Stefano Iacus
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About the data

These are partial results from The Geography of Human Flourishing Project analysis for the years 2010-2023. This project is one of the 10 national projects awarded within the Spatial AI-Challange 2024, an international initiative at the crossroads of geospatial science and artificial intelligence. At present only a subset of data for 2010-2012 are present. Data are in the form of CSV or parquet. In the datasets, FIPS is the FIPS code for a US state, county is the US… See the full description on the dataset page: https://huggingface.co/datasets/siacus/flourishing.

VTuber 1B: Live Chat and Moderation Statistics

kaggle.com

Updated Aug 4, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

uetchy (2022). VTuber 1B: Live Chat and Moderation Statistics [Dataset]. https://www.kaggle.com/uetchy/vtuber-livechat/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Aug 4, 2022

Dataset provided by

Kagglehttp://kaggle.com/

Authors

uetchy

License

ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically

Description

VTuber 1B is a dataset for large-scale academic research, collecting over a billion live chats, superchats, and moderation events (bans/deletions) from virtual YouTubers' live streams.

See GitHub and join #livechat-dataset channel on SIGVT Discord for discussions.

We also offer ❤️‍🩹 Sensai, a live chat dataset specifically made for building ML models for spam detection / toxic chat classification.

Provenance

Source: YouTube live chat events collected by our Honeybee cluster. Holodex is a stream index provider for Honeybee which covers Hololive, Nijisanji, 774inc, etc.
Temporal Coverage:
- Chats: from 2021-01-15
- Super chats: from 2021-03-16
Update Frequency:
- At least once every 6 months

Research Ideas

Toxic Chat Classification
Spam Detection
Demographic Visualization
Superchat Analysis
Training neural language models

See public notebooks built on VTuber 1B and VTuber 1B Elements for ideas.

We employed Honeybee cluster to collect real-time live chat events across major Vtubers' live streams. All sensitive data such as author name or author profile image are omitted from the dataset, and author channel id is anonymized by SHA-1 hashing algorithm with a grain of salt.

Editions

VTuber 1B Elements

Kaggle Datasets (2 MB)

VTuber 1B Elements is most suitable for statistical visualizations and explanatory data analysis.

filename	summary
`channels.csv`	Channel index
`chat_stats.csv`	Chat statistics
`superchat_stats.csv`	Super Chat statistics

VTuber 1B

Kaggle Datasets

VTuber 1B is most suitable for frequency analysis. This edition includes only the essential columns in order to reduce dataset size and make it faster fro Kaggle Kernels to load data in.

filename	summary
`chats_%Y-%m.parquet`	Live chat events (> 1,000,000,000)
`superchats_%Y-%m.parquet`	Super chat events (> 4,000,000)
`deletion_events.parquet`	Deletion events
`ban_events.parquet`	Ban events

Dataset Breakdown

Ban and deletion are equivalent to markChatItemsByAuthorAsDeletedAction and markChatItemAsDeletedAction respectively.

Chats (`chats_%Y-%m.csv`)

column	type	description	in standard version
timestamp	string	ISO 8601 UTC timestamp	limited accuracy
id	string	chat id	N/A
authorName	string	author name	N/A
authorChannelId	string	author channel id	anonymized
body	string	chat message	N/A
bodyLength	number	chat message length	standard version only
membership	string	membership status	N/A
isMember	nullable boolean	is member (null if unknown)	standard version only
isModerator	boolean	is channel moderator	N/A
isVerified	boolean	is verified account	N/A
videoId	string	source video id
channelId	string	source channel id

Membership status

value	duration
unknown	Indistinguishable
non-member	0
new	< 1 month
1 month	>= 1 month, < 2 months
2 months	>= 2 months, < 6 months
6 months	>= 6 months, < 12 months
1 year	>= 12 months, < 24 months
2 years	>= 24 months

Pandas usage

Set keep_default_na to False and na_values to '' in read_csv. Otherwise, chat message like NA would incorrectly be treated as NaN value.

chats = ...

Facebook

Twitter

Click to copy link

Link copied

Cite

State of California (2021). Surface Water - Habitat Results [Dataset]. https://datasets.ai/datasets/surface-water-habitat-results

Surface Water - Habitat Results

Explore at:

57, 8, 33Available download formats

Dataset updated

Jul 23, 2021

Dataset authored and provided by

State of California

Description

This data provides results from field analyses, from the California Environmental Data Exchange Network (CEDEN). The data set contains two provisionally assigned values (“DataQuality” and “DataQualityIndicator”) to help users interpret the data quality metadata provided with the associated result.

Due to file size limitations, the data has been split into individual resources by year. The entire dataset can also be downloaded in bulk using the zip files on this page (in csv format or parquet format), and developers can also use the API associated with each year's dataset to access the data. Example R code using the API to access data across all years can be found here.

Users who want to manually download more specific subsets of the data can also use the CEDEN query tool, at: https://ceden.waterboards.ca.gov/AdvancedQueryTool

Clear search

Close search

Google apps

Main menu

Surface Water - Habitat Results

Surface Water - Habitat Results

PARQUET - Basic climatological data - monthly - daily - hourly - 6 minutes...

Surface Water - Benthic Macroinvertebrate Results

UMP_CSV_Parquet

Dataset

Contents

Parquet Files: AMEX-Default Prediction

Content

Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global /...

Surface Water - Chemistry Results

CKW Smart Meter Data

Amazon Prime Dataset

GitTables 1M

Surface Water - Chemistry Results

feature-factory-datasets

Records (Approx.)

Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...

Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

Data from: HERO WEC 2024 - Electrical Configuration Deployment Data

Replication package for "An Exploratory Study on Default Class Openness in...

LIST

flourishing

VTuber 1B: Live Chat and Moderation Statistics

Provenance

Research Ideas

Editions

VTuber 1B Elements

VTuber 1B

Dataset Breakdown

Chats (chats_%Y-%m.csv)

Membership status

Pandas usage

Surface Water - Habitat ResultsSee More Versions

Chats (`chats_%Y-%m.csv`)

Surface Water - Habitat Results