87 datasets found

i
Renewable Energy Dataset and Load Demand Dataset from California ISO
ieee-dataport.org
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manqiu Huang (2025). Renewable Energy Dataset and Load Demand Dataset from California ISO [Dataset]. http://doi.org/10.21227/b2cc-x995
Explore at:
Unique identifier
https://doi.org/10.21227/b2cc-x995
Dataset updated
Mar 8, 2025
Dataset provided by
IEEE Dataport
Authors
Manqiu Huang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All three datasets are in CSV file format and contain renewable energy and load demand data from August 1, 2024 to August 30, 2024. The dataset is sourced from California ISO and is stored through an open-source repository https://github.com/gridstatus/gridstatus. Renewable energy sources include wind and solar energy, and the load demand is based on data from the Trinity Public Utility District (TIDC) and the Trook Irrigation District (TPWR). The key data in the dataset are: the "MW" column represents power measured in megawatts, "OPR_DT" represents the operation date in YYYY-MM-DD format, and "OPR_HR" represents the 24 hours within a day (1-24). The datasets provide high-resolution temporal information on renewable energy generation and variations in load demand, which are valuable for analyzing grid stability and demand forecasting in electrical grid system.
Z
Data pipeline Validation And Load Testing using Multiple CSV Files
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Mar 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afsana Khan (2021). Data pipeline Validation And Load Testing using Multiple CSV Files [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4636797
Explore at:
Dataset updated
Mar 26, 2021
Dataset provided by
Afsana Khan
Pelle Jakovits
Mainak Adhikari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets were used to validate and test the data pipeline deployment following the RADON approach. The dataset has a CSV file that contains around 32000 Twitter tweets. 100 CSV files have been created from the single CSV file and each CSV file containing 320 tweets. Those 100 CSV files are used to validate and test (performance/load testing) the data pipeline components.
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...
zenodo.org
data.niaid.nih.gov
bin, csv, zip
Updated Dec 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. http://doi.org/10.5281/zenodo.6965147
Explore at:
bin, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6965147
Dataset updated
Dec 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alexander R. Hartloper; Alexander R. Hartloper; Selimcan Ozden; Albano de Castro e Sousa; Dimitrios G. Lignos; Dimitrios G. Lignos; Selimcan Ozden; Albano de Castro e Sousa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

Background

This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

Usage

The data is licensed through the Creative Commons Attribution 4.0 International.

If you have used our data and are publishing your work, we ask that you please reference both:

this database through its DOI, and

any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

Included Files

Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Clean_Data_v1-0-0.zip: contains all the downsampled data

The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

Database_References_v1-0-0.bib

Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

File Format: Downsampled Data

These are the "LP_

The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

Time[s]: time in seconds since the start of the test

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: the surface temperature in degC

These data files can be easily loaded using the pandas library in Python through:

import pandas data = pandas.read_csv(data_file, index_col=0)

The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

File Format: Unreduced Data

These are the "LP_

The first column is the index of each data point

S/No: sample number recorded by the DAQ

System Date: Date and time of sample

Time[s]: time in seconds since the start of the test

C_1_Force[kN]: load cell force

C_1_Déform1[mm]: extensometer displacement

C_1_Déplacement[mm]: cross-head displacement

Eng_Stress[MPa]: engineering stress

Eng_Strain[]: engineering strain

e_true: true strain

Sigma_true: true stress in MPa

(optional) Temperature[C]: specimen surface temperature in degC

The data can be loaded and used similarly to the downsampled data.

File Format: Overall_Summary

The overall summary file provides data on all the test specimens in the database. The columns include:

hidden_index: internal reference ID

grade: material grade

spec: specifications for the material

source: base material for the test specimen

id: internal name for the specimen

lp: load protocol

size: type of specimen (M8, M12, M20)

gage_length_mm_: unreduced section length in mm

avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

fy_n_mpa_: nominal yield stress

fu_n_mpa_: nominal ultimate stress

t_a_deg_c_: ambient temperature in degC

date: date of test

investigator: person(s) who conducted the test

location: laboratory where test was conducted

machine: setup used to conduct test

pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

citekey: reference corresponding to the Database_References.bib file

yield_stress_mpa_: computed yield stress in MPa

elastic_modulus_mpa_: computed elastic modulus in MPa

fracture_strain: computed average true strain across the fracture surface

c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

file: file name of corresponding clean (downsampled) stress-strain data

File Format: Summarized_Mechanical_Props_Campaign

Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

citekey: reference in "Campaign_References.bib".

Grade: material grade.

Spec.: specifications (e.g., J2+N).

Yield Stress [MPa]: initial yield stress in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Elastic Modulus [MPa]: initial elastic modulus in MPa

size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

Caveats

The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

A500

A992_Gr50

BCP325

BCR295

HYP400

S460NL

S690QL/25mm

S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
Company Datasets for Business Profiling
datarade.ai
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oxylabs (2017). Company Datasets for Business Profiling [Dataset]. https://datarade.ai/data-products/company-datasets-for-business-profiling-oxylabs
Explore at:
.json, .xml, .csv, .xlsAvailable download formats
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Oxylabs
Area covered
Taiwan, Tunisia, Northern Mariana Islands, Moldova (Republic of), Canada, British Indian Ocean Territory, Isle of Man, Nepal, Andorra, Bangladesh
Description
Company Datasets for valuable business insights!

Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.

These datasets are sourced from top industry providers, ensuring you have access to high-quality information:

Owler: Gain valuable business insights and competitive intelligence. -AngelList: Receive fresh startup data transformed into actionable insights. -CrunchBase: Access clean, parsed, and ready-to-use business data from private and public companies. -Craft.co: Make data-informed business decisions with Craft.co's company datasets. -Product Hunt: Harness the Product Hunt dataset, a leader in curating the best new products.

We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:

Company name;

Size;

Founding date;

Location;

Industry;

Revenue;

Employee count;

Competitors.

You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.

Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.

With Oxylabs Datasets, you can count on:

Fresh and accurate data collected and parsed by our expert web scraping team.

Time and resource savings, allowing you to focus on data analysis and achieving your business goals.

A customized approach tailored to your specific business needs.

Legal compliance in line with GDPR and CCPA standards, thanks to our membership in the Ethical Web Data Collection Initiative.

Pricing Options:

Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

Experience a seamless journey with Oxylabs:

Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.

Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.

Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.

Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
final star tracker input csv
kaggle.com
zip
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rawan Mostafa Rakha (2024). final star tracker input csv [Dataset]. https://www.kaggle.com/datasets/rawanmostafarakha/final-star-tracker-input-csv
Explore at:
zip(503433 bytes)Available download formats
Dataset updated
Jul 9, 2024
Authors
Rawan Mostafa Rakha
Description
Dataset

This dataset was created by Rawan Mostafa Rakha

Contents
h
conversations-dataset
huggingface.co
Updated Mar 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
said (2025). conversations-dataset [Dataset]. https://huggingface.co/datasets/mugivara1/conversations-dataset
Explore at:
Dataset updated
Mar 23, 2025
Authors
said
Description
Conversations Dataset

This dataset contains conversational data formatted as CSV for easy loading and processing.

Dataset Structure

The dataset is a CSV file with the following columns:

conversation_id: Unique identifier for each conversation message_id: The position of the message in the conversation role: The sender of the message (human, gpt, system) content: The content of the message

Usage

from datasets import load_dataset

Load the dataset

dataset =… See the full description on the dataset page: https://huggingface.co/datasets/mugivara1/conversations-dataset.
Industrial Park Management Bureau of the Ministry of Economic...
data.gov.tw
csv
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bureau of Industrial Parks, Ministry of Economic Affairs (2024). Industrial Park Management Bureau of the Ministry of Economic Affairs_Statistics on Import and Export Trade Volume of Science and Technology Industrial Parks [Dataset]. https://data.gov.tw/en/datasets/25792
Explore at:
csvAvailable download formats
Dataset updated
Oct 14, 2024
Dataset authored and provided by
Bureau of Industrial Parks, Ministry of Economic Affairs
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provide "Statistics of Import and Export Trade Volume of Each Park" to let the public understand the import and export and its growth trend of each park. In addition to updating this information every month, CSV file format is also provided for free download and use by the public.The dataset includes statistics on the import and export trade volume of parks such as Nanzih, Kaohsiung, Taichung, Zhonggang, Pingtung, and other parks (Lingguang, Chenggong, Gaoruan), with main fields including "Park, Import and Export (This Month, Year-to-Date)", "Export (This Month, Year-to-Date)", "Import (This Month, Year-to-Date)", and other important information.
Dataset on the Human Body as a Signal Propagation Medium
zenodo.org
data.niaid.nih.gov
bin, csv, jpeg, pdf +3
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko; A. Elsts; A. Elsts; J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko (2024). Dataset on the Human Body as a Signal Propagation Medium [Dataset]. http://doi.org/10.5281/zenodo.8214497
Explore at:
csv, text/x-python, pdf, jpeg, png, bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8214497
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko; A. Elsts; A. Elsts; J. Ormanis; V. Medvedevs; V. Aristovs; V. Abolins; A. Sevcenko
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.

Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.

Overview statistics:

Number of subjects: 30

Number of transmitter locations: 6

Number of receiver locations: 6

Number of measurement frequencies: 19

Input voltage: 1 V

Load resistance: 50 ohm and 1 megaohm

Measurement group statistics:

Height: 174.10 (7.15)

Weight: 72.85 (16.26)

BMI: 23.94 (4.70)

Body fat %: 21.53 (7.55)

Age group: 29.00 (11.25)

Male/female ratio: 50%

Included files:

experiment_protocol_description.docx - protocol used in the experiments

electrode_placement_schematic.png - schematic of placement locations

electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject

RawData - the full measurement results and experiment info sheets

all_measurements.csv - the most important results extracted to .csv

all_measurements_filtered.csv - same, but after z-score filtering

all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row

all_measurements_by_freq_filtered.csv - same, but after z-score filtering

summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets

process_json_files.py - script that creates .csv from the raw data

filter_results.py - outlier removal based on z-score

plot_sample_curves.py - visualization of a randomly selected measurement result subset

plot_measurement_group.py - visualization of the measurement group

CSV file columns:

subject_id - participant's random unique ID

experiment_id - measurement session's number for the participant

height - participant's height, cm

weight - participant's weight, kg

BMI - body mass index, computed from the valued above

body_fat_% - body fat composition, as measured by bioimpedance scales

age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.

male - 1 if male, 0 if female

tx_point - transmitter point number

rx_point - receiver point number

distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!

tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.

rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.

total_fat_level - sum of rx and tx fat levels

bias - constant term to simplify data analytics, always equal to 1.0

CSV file columns, frequency-specific:

tx_abs_Z_... - transmitter-side impedance, as computed by the `process_json_files.py` script from the voltage drop

rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance

rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance

Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project “Body-Coupled Communication for Body Area Networks”, project No. lzp-2020/1-0358.

References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.

Contact information: info@edi.lv
Thyroid_Dataset_Input_CSV_File
kaggle.com
zip
Updated Aug 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
animou123 (2024). Thyroid_Dataset_Input_CSV_File [Dataset]. https://www.kaggle.com/datasets/animou123/thyroid-dataset-input-csv-file/code
Explore at:
zip(51144 bytes)Available download formats
Dataset updated
Aug 21, 2024
Authors
animou123
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by animou123

Released under Apache 2.0

Contents
m
KU-MG2: A Dataset for Hybrid Photovoltaic-Natural Gas Generator Microgrid...
data.mendeley.com
search.datacite.org
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdullah-Al Nahid (2020). KU-MG2: A Dataset for Hybrid Photovoltaic-Natural Gas Generator Microgrid Model of a Residential Area. (For Padma residential area, Rajshahi, Bangladesh) [Dataset]. http://doi.org/10.17632/js5mtkf5yk.1
Explore at:
Unique identifier
https://doi.org/10.17632/js5mtkf5yk.1
Dataset updated
Jul 28, 2020
Authors
Abdullah-Al Nahid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Padma Residential Area, Rajshahi, Bangladesh
Description
a renewable energy resource-based sustainable microgrid model for a residential area is designed by HOMER PRO microgrid software. A small-sized residential area of 20 buildings of about 60 families with 219 MWh and an electric vehicle charging station of daily 10 batteries with 18.3MWh annual energy consumption considered for Padma residential area, Rajshahi (24°22.6'N, 88°37.2'E) is selected as our case study. Solar panels, natural gas generator, inverter and Li-ion batteries are required for our proposed model. The HOMER PRO microgrid software is used to optimize our designed microgrid model. Data were collected from HOMER PRO for the year 2007. We have compared our daily load demand 650KW with the results varying the load by 10%, 5%, 2.5% more and less to find out the best case according to our demand. We have a total of 7 different datasets for different load conditions where each dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data file contents: Data 1:: original_load.csv: This file contains data for 650KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Data arrangement is given below: Column 1: Date and time of data recording in the format of MM-DD- YYYY [hh]:[mm]. Time is in 24-hour format. Column 2: Solar power output in KW unit. Column 3: Generator power output in KW unit. Column 4: Total Electrical load served in KW unit. Column 5: Excess electrical production in KW unit. Column 6: Li-ion battery energy content in KWh unit. Column 7: Li-ion battery state of charge in % unit.

Data 2:: 2.5%_more_load.csv: This file contains data for 677KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.

Data 3:: 2.5%_less_load.csv: This file contains data for 622KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.

Data 4:: 5%_more_load.csv: This file contains data for 705KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 5:: 5%_less_load.csv: This file contains data for 595KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 6:: 10%_more_load.csv: This file contains data for the 760KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset. Data 7:: 10%_less_load.csv: This file contains data for 540KW load demand. The dataset contains a total of 8760 sets of data having 6 different parameters for each set. Column information is the same for every dataset.
Z
Data from: Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large...
data.niaid.nih.gov
explore.openaire.eu
+1more
Updated Jun 23, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weyn, Maarten (2020). Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large Urban and Rural Areas [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1193562
Explore at:
Dataset updated
Jun 23, 2020
Dataset provided by
Berkvens, Rafael
Van Vlaenderen, Koen
Weyn, Maarten
Aernouts, Michiel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
INTRODUCTION

The goal of these LPWAN datasets is to provide the global research community with a benchmark tool to evaluate fingerprint localization algorithms in large outdoor environments with various properties. An identical collection methodology was used for all datasets: during a period of three months, numerous devices containing a GPS receiver periodically obtained new location data, which was sent to a local data server via a Sigfox or LoRaWAN message. Together with network information such as the receiving time of the message, base station IDs' of all receiving base stations and the Received Signal Strength Indicator (RSSI) per base station, this location data was stored in one of the three LPWAN datasets:

lorawan_dataset_antwerp.csv

130 430 LoRaWAN messages, obtained in the city center of Antwerp

sigfox_dataset_antwerp.csv

14 378 Sigfox messages, obtained in the city center of Antwerp

sigfox_dataset_rural.csv

25 638 Sigfox messages, obtained in a rural area between Antwerp and Ghent

As the rural and urban Sigfox datasets were recorded in adjacent areas, many base stations that are located at the border of these areas can be found in both datasets. However, they do not necessarily share the same identifier: e.g. ‘BS 1’ in the urban Sigfox dataset could be the same base station as ‘BS 36’ in the rural Sigfox dataset. If the user intends to combine both Sigfox datasets, the mapping of the ID's of these base stations can be found in the file:

sigfox_bs_mapping.csv

The collection methodology of the datasets, and the first results of a basic fingerprinting implementation are documented in the following journal paper: http://www.mdpi.com/2306-5729/3/2/13

UPDATES IN VERSION 1.2

In this version of the LPWAN dataset, only the LoRaWAN set has been updated. The Sigfox datasets remain identical to version 1.0 and 1.1. The main updates in the LoRaWAN set are the following:

New data: the LoRaWAN messages in the new set are collected 1 year after the previous dataset version. To be consistent with the previous versions, the new LoRaWAN set is uploaded in the same .CSV format as before. This upload can still be found in this repository as ‘lorawan_dataset_antwerp.csv’.

More gateways: Compared to the previous dataset, 4 gateways were added to the LoRaWAN network. The RSSI of these gateways are shown in columns ‘BS 69’, ‘BS 70’,‘BS 71’ and ‘BS 72’. All other ‘BS’ columns are in the same order as in previous dataset versions.

More metadata: In the previous LoRaWAN dataset, metadata was limited to 3 receiving gateways per message. In the new dataset version, metadata from all receiving gateways is included in every message. Moreover, some gateways provide a timestamp with nanosecond precision, which can be used to evaluate Time Difference of Arrival localization methods with LoRaWAN.

2 file formats: As more metadata becomes available, we find it important to share the dataset in a clearer overview. This also allows researchers to evaluate the performance of LoRaWAN in an urban environment. Therefore, we publish the new LoRaWAN dataset as a .CSV file as described above, but also as a .JSON file (lorawan_antwerp_2019_dataset.json.txt, the .txt file type had to be appended, otherwise the file could not be uploaded to Zenodo) An example of one message in this JSON format can be seen below:

JSON format description:

HDOP: Horizontal Dilution of Precision

dev_addr: LoRaWAN device address

dev_eui: LoRaWAN device EUI

sf: Spreading factor

channel: TX channel (EU region)

payload: application payload

adr: Adaptive Data Rate (1 = enabled, 0= disabled)

counter: device uplink message counter

latitude: Groundtruth TX location latitude

longitude: Groundtruth TX location longitude

airtime: signal airtime (seconds)

gateways:

rssi: Received Signal Strength

esp: Estimated Signal Power

snr: Signal-to-Noise Ratio

ts_type: Timestamp type. If this says "GPS_RADIO", a nanosecond precise timestamp is available

time: time of arrival at the gateway

id: gateway ID

JSON example

{ "hdop": 0.7, "dev_addr": "07000EFE", "payload": "008d000392d54c4284d18c403333333f04682aa9410500e8fd4106cabdbc420f00db0d470ce32ac93f0d582be93f0bfa3f8d3f", "adr": 1, "latitude": 51.20856475830078, "counter": 31952, "longitude": 4.400575637817383, "airtime": 0.112896, "gateways": [ { "rssi": -115, "esp": -115.832695, "snr": 6.75, "rx_time": { "ts_type": "None", "time": "2019-01-04T08:59:53.079+01:00" }, "id": "08060716" }, { "rssi": -116, "esp": -125.51497, "snr": -9.0, "rx_time": { "ts_type": "GPS_RADIO", "time": "2019-01-04T08:59:53.962029179+01:00" }, "id": "FF0178DF" } ], "dev_eui": "3432333853376B18", "sf": 7, "channel": 8 }
a
TMS daily traffic counts CSV
hub.arcgis.com
opendata-nzta.opendata.arcgis.com
Updated Aug 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Waka Kotahi (2020). TMS daily traffic counts CSV [Dataset]. https://hub.arcgis.com/datasets/9cb86b342f2d4f228067a7437a7f7313
Explore at:
Dataset updated
Aug 30, 2020
Dataset authored and provided by
Waka Kotahi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
You can also access an API version of this dataset.

TMS

(traffic monitoring system) daily-updated traffic counts API

Important note: due to the size of this dataset, you won't be able to open it fully in Excel. Use notepad / R / any software package which can open more than a million rows.

Data reuse caveats: as per license.

Data quality

statement: please read the accompanying user manual, explaining:

how

this data is collected identification of count stations traffic monitoring technology monitoring hierarchy and conventions typical survey specification data calculation TMS operation.

Traffic

monitoring for state highways: user manual

[PDF 465 KB]

The data is at daily granularity. However, the actual update

frequency of the data depends on the contract the site falls within. For telemetry

sites it's once a week on a Wednesday. Some regional sites are fortnightly, and

some monthly or quarterly. Some are only 4 weeks a year, with timing depending

on contractors’ programme of work.

Data quality caveats: you must use this data in

conjunction with the user manual and the following caveats.

The

road sensors used in data collection are subject to both technical errors and environmental interference.Data is compiled from a variety of sources. Accuracy may vary and the data should only be used as a guide.As not all road sections are monitored, a direct calculation of Vehicle Kilometres Travelled (VKT) for a region is not possible.Data is sourced from Waka Kotahi New Zealand Transport Agency TMS data.For sites that use dual loops classification is by length. Vehicles with a length of less than 5.5m are classed as light vehicles. Vehicles over 11m long are classed as heavy vehicles. Vehicles between 5.5 and 11m are split 50:50 into light and heavy.In September 2022, the National Telemetry contract was handed to a new contractor. During the handover process, due to some missing documents and aged technology, 40 of the 96 national telemetry traffic count sites went offline. Current contractor has continued to upload data from all active sites and have gradually worked to bring most offline sites back online. Please note and account for possible gaps in data from National Telemetry Sites.

The NZTA Vehicle

Classification Relationships diagram below shows the length classification (typically dual loops) and axle classification (typically pneumatic tube counts),

and how these map to the Monetised benefits and costs manual, table A37,

page 254.

Monetised benefits and costs manual [PDF 9 MB]

For the full TMS

classification schema see Appendix A of the traffic counting manual vehicle

classification scheme (NZTA 2011), below.

Traffic monitoring for state highways: user manual [PDF 465 KB]

State highway traffic monitoring (map)

State highway traffic monitoring sites
Dataset
figshare.com
application/x-gzip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moynuddin Ahmed Shibly (2023). Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.13577873.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13577873.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Moynuddin Ahmed Shibly
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an open source - publicly available dataset which can be found at https://shahariarrabby.github.io/ekush/ . We split the dataset into three sets - train, validation, and test. For our experiments, we created two other versions of the dataset. We have applied 10-fold cross validation on the train set and created ten folds. We also created ten bags of datasets using bootstrap aggregating method on the train and validation sets. Lastly, we created another dataset using pre-trained ResNet50 model as feature extractor. On the features extracted by ResNet50 we have applied PCA and created a tabilar dataset containing 80 features. pca_features.csv is the train set and pca_test_features.csv is the test set. Fold.tar.gz contains the ten folds of images described above. Those folds are also been compressed. Similarly, Bagging.tar.gz contains the ten compressed bags of images. The original train, validation, and test sets are in Train.tar.gz, Validation.tar.gz, and Test.tar.gz, respectively. The compression has been performed for speeding up the upload and download purpose and mostly for the sake of convenience. If anyone has any question about how the datasets are organized please feel free to ask me at shiblygnr@gmail.com .I will get back to you in earliest time possible.
Purchase Order Data
data.ca.gov
catalog.data.gov
csv, docx, pdf
Updated Oct 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of General Services (2019). Purchase Order Data [Dataset]. https://data.ca.gov/dataset/purchase-order-data
Explore at:
docx, pdf, csvAvailable download formats
Dataset updated
Oct 23, 2019
Dataset authored and provided by
California Department of General Services
Description
The State Contract and Procurement Registration System (SCPRS) was established in 2003, as a centralized database of information on State contracts and purchases over $5000. eSCPRS represents the data captured in the State's eProcurement (eP) system, Bidsync, as of March 16, 2009. The data provided is an extract from that system for fiscal years 2012-2013, 2013-2014, and 2014-2015

Data Limitations:
Some purchase orders have multiple UNSPSC numbers, however only first was used to identify the purchase order. Multiple UNSPSC numbers were included to provide additional data for a DGS special event however this affects the formatting of the file. The source system Bidsync is being deprecated and these issues will be resolved in the future as state systems transition to Fi$cal.

Data Collection Methodology:

The data collection process starts with a data file from eSCPRS that is scrubbed and standardized prior to being uploaded into a SQL Server database. There are four primary tables. The Supplier, Department and United Nations Standard Products and Services Code (UNSPSC) tables are reference tables. The Supplier and Department tables are updated and mapped to the appropriate numbering schema and naming conventions. The UNSPSC table is used to categorize line item information and requires no further manipulation. The Purchase Order table contains raw data that requires conversion to the correct data format and mapping to the corresponding data fields. A stacking method is applied to the table to eliminate blanks where needed. Extraneous characters are removed from fields. The four tables are joined together and queries are executed to update the final Purchase Order Dataset table. Once the scrubbing and standardization process is complete the data is then uploaded into the SQL Server database.

Secondary/Related Resources:

State Contract Manual (SCM) vol. 2 http://www.dgs.ca.gov/pd/Resources/publications/SCM2.aspx

State Contract Manual (SCM) vol. 3 http://www.dgs.ca.gov/pd/Resources/publications/SCM3.aspx

Buying Green http://www.dgs.ca.gov/buyinggreen/Home.aspx

United Nations Standard Products and Services Code, http://www.unspsc.org/
Repository Analytics and Metrics Portal (RAMP) 2019 data
data.niaid.nih.gov
zenodo.org
zip
Updated Jul 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonathan Wheeler; Kenning Arlitsch (2021). Repository Analytics and Metrics Portal (RAMP) 2019 data [Dataset]. http://doi.org/10.5061/dryad.crjdfn342
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.crjdfn342
Dataset updated
Jul 14, 2021
Dataset provided by
Montana State University
University of New Mexico
Authors
Jonathan Wheeler; Kenning Arlitsch
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Version update: The originally uploaded versions of the CSV files in this dataset included an extra column, "Unnamed: 0," which is not RAMP data and was an artifact of the process used to export the data to CSV format. This column has been removed from the revised dataset. The data are otherwise the same as in the first version.

The Repository Analytics and Metrics Portal (RAMP) is a web service that aggregates use and performance use data of institutional repositories. The data are a subset of data from RAMP, the Repository Analytics and Metrics Portal (http://rampanalytics.org), consisting of data from all participating repositories for the calendar year 2019. For a description of the data collection, processing, and output methods, please see the "methods" section below.

Methods

Data Collection

RAMP data are downloaded for participating IR from Google Search Console (GSC) via the Search Console API. The data consist of aggregated information about IR pages which appeared in search result pages (SERP) within Google properties (including web search and Google Scholar).

Data are downloaded in two sets per participating IR. The first set includes page level statistics about URLs pointing to IR pages and content files. The following fields are downloaded for each URL, with one row per URL:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Following data processing describe below, on ingest into RAMP a additional field, citableContent, is added to the page level data.

The second set includes similar information, but instead of being aggregated at the page level, the data are grouped based on the country from which the user submitted the corresponding search, and the type of device used. The following fields are downloaded for combination of country and device, with one row per country/device combination:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search.

Note that no personally identifiable information is downloaded by RAMP. Google does not make such information available.

More information about click-through rates, impressions, and position is available from Google's Search Console API documentation: https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query and https://support.google.com/webmasters/answer/7042828?hl=en

Data Processing

Upon download from GSC, the page level data described above are processed to identify URLs that point to citable content. Citable content is defined within RAMP as any URL which points to any type of non-HTML content file (PDF, CSV, etc.). As part of the daily download of page level statistics from Google Search Console (GSC), URLs are analyzed to determine whether they point to HTML pages or actual content files. URLs that point to content files are flagged as "citable content." In addition to the fields downloaded from GSC described above, following this brief analysis one more field, citableContent, is added to the page level data which records whether each page/URL in the GSC data points to citable content. Possible values for the citableContent field are "Yes" and "No."

The data aggregated by the search country of origin and device type do not include URLs. No additional processing is done on these data. Harvested data are passed directly into Elasticsearch.

Processed data are then saved in a series of Elasticsearch indices. Currently, RAMP stores data in two indices per participating IR. One index includes the page level data, the second index includes the country of origin and device type data.

About Citable Content Downloads

Data visualizations and aggregations in RAMP dashboards present information about citable content downloads, or CCD. As a measure of use of institutional repository content, CCD represent click activity on IR content that may correspond to research use.

CCD information is summary data calculated on the fly within the RAMP web application. As noted above, data provided by GSC include whether and how many times a URL was clicked by users. Within RAMP, a "click" is counted as a potential download, so a CCD is calculated as the sum of clicks on pages/URLs that are determined to point to citable content (as defined above).

For any specified date range, the steps to calculate CCD are:

Filter data to only include rows where "citableContent" is set to "Yes." Sum the value of the "clicks" field on these rows.

Output to CSV

Published RAMP data are exported from the production Elasticsearch instance and converted to CSV format. The CSV data consist of one "row" for each page or URL from a specific IR which appeared in search result pages (SERP) within Google properties as described above. Also as noted above, daily data are downloaded for each IR in two sets which cannot be combined. One dataset includes the URLs of items that appear in SERP. The second dataset is aggregated by combination of the country from which a search was conducted and the device used.

As a result, two CSV datasets are provided for each month of published data:

page-clicks:

The data in these CSV files correspond to the page-level data, and include the following fields:

url: This is returned as a 'page' by the GSC API, and is the URL of the page which was included in an SERP for a Google property. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. citableContent: Whether or not the URL points to a content file (ending with pdf, csv, etc.) rather than HTML wrapper pages. Possible values are Yes or No. index: The Elasticsearch index corresponding to page click data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “page-clicks”. For example, the file named 2019-01_RAMP_all_page-clicks.csv contains page level click data for all RAMP participating IR for the month of January, 2019.

country-device-info:

The data in these CSV files correspond to the data aggregated by country from which a search was conducted and the device used. These include the following fields:

country: The country from which the corresponding search originated. device: The device used for the search. impressions: The number of times the URL appears within the SERP. clicks: The number of clicks on a URL which took users to a page outside of the SERP. clickThrough: Calculated as the number of clicks divided by the number of impressions. position: The position of the URL within the SERP. date: The date of the search. index: The Elasticsearch index corresponding to country and device access information data for a single IR. repository_id: This is a human readable alias for the index and identifies the participating repository corresponding to each row. As RAMP has undergone platform and version migrations over time, index names as defined for the previous field have not remained consistent. That is, a single participating repository may have multiple corresponding Elasticsearch index names over time. The repository_id is a canonical identifier that has been added to the data to provide an identifier that can be used to reference a single participating repository across all datasets. Filtering and aggregation for individual repositories or groups of repositories should be done using this field.

Filenames for files containing these data end with “country-device-info”. For example, the file named 2019-01_RAMP_all_country-device-info.csv contains country and device data for all participating IR for the month of January, 2019.

References

Google, Inc. (2021). Search Console APIs. Retrieved from https://developers.google.com/webmaster-tools/search-console-api-original.
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
g
Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...
gimi9.com
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting [Dataset]. https://www.gimi9.com/dataset/data-gov_buildingsbench-a-large-scale-dataset-of-900k-buildings-and-benchmark-for-short-term-load-f/
Explore at:
Dataset updated
Dec 4, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BuildingsBench datasets consist of: Buildings-900K: A large-scale dataset of 900K buildings for pretraining models on the task of short-term load forecasting (STLF). Buildings-900K is statistically representative of the entire U.S. building stock. 7 real residential and commercial building datasets for benchmarking two downstream tasks evaluating generalization: zero-shot STLF and transfer learning for STLF. Buildings-900K can be used for pretraining models on day-ahead STLF for residential and commercial buildings. The specific gap it fills is the lack of large-scale and diverse time series datasets of sufficient size for studying pretraining and finetuning with scalable machine learning models. Buildings-900K consists of synthetically generated energy consumption time series. It is derived from the NREL End-Use Load Profiles (EULP) dataset (see link to this database in the links further below). However, the EULP was not originally developed for the purpose of STLF. Rather, it was developed to "...help electric utilities, grid operators, manufacturers, government entities, and research organizations make critical decisions about prioritizing research and development, utility resource and distribution system planning, and state and local energy planning and regulation." Similar to the EULP, Buildings-900K is a collection of Parquet files and it follows nearly the same Parquet dataset organization as the EULP. As it only contains a single energy consumption time series per building, it is much smaller (~110 GB). BuildingsBench also provides an evaluation benchmark that is a collection of various open source residential and commercial real building energy consumption datasets. The evaluation datasets, which are provided alongside Buildings-900K below, are collections of CSV files which contain annual energy consumption. The size of the evaluation datasets altogether is less than 1GB, and they are listed out below: ElectricityLoadDiagrams20112014 Building Data Genome Project-2 Individual household electric power consumption (Sceaux)
h
VisRAG-Ret-Test-ChartQA
huggingface.co
Updated Oct 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenBMB (2024). VisRAG-Ret-Test-ChartQA [Dataset]. https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-ChartQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 23, 2024
Dataset authored and provided by
OpenBMB
Description
Dataset Description

This is a VQA dataset based on Charts from ChartQA dataset from ChartQA.

Load the dataset

from datasets import load_dataset import csv

def load_beir_qrels(qrels_file): qrels = {} with open(qrels_file) as f: tsvreader = csv.DictReader(f, delimiter="\t") for row in tsvreader: qid = row["query-id"] pid = row["corpus-id"] rel = int(row["score"]) if qid in qrels:… See the full description on the dataset page: https://huggingface.co/datasets/openbmb/VisRAG-Ret-Test-ChartQA.
d
Digital data for the Salinas Valley Geological Framework, California
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Digital data for the Salinas Valley Geological Framework, California [Dataset]. https://catalog.data.gov/dataset/digital-data-for-the-salinas-valley-geological-framework-california
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Salinas, Salinas Valley, California
Description
This digital dataset was created as part of a U.S. Geological Survey study, done in cooperation with the Monterey County Water Resource Agency, to conduct a hydrologic resource assessment and develop an integrated numerical hydrologic model of the hydrologic system of Salinas Valley, CA. As part of this larger study, the USGS developed this digital dataset of geologic data and three-dimensional hydrogeologic framework models, referred to here as the Salinas Valley Geological Framework (SVGF), that define the elevation, thickness, extent, and lithology-based texture variations of nine hydrogeologic units in Salinas Valley, CA. The digital dataset includes a geospatial database that contains two main elements as GIS feature datasets: (1) input data to the 3D framework and textural models, within a feature dataset called “ModelInput”; and (2) interpolated elevation, thicknesses, and textural variability of the hydrogeologic units stored as arrays of polygonal cells, within a feature dataset called “ModelGrids”. The model input data in this data release include stratigraphic and lithologic information from water, monitoring, and oil and gas wells, as well as data from selected published cross sections, point data derived from geologic maps and geophysical data, and data sampled from parts of previous framework models. Input surface and subsurface data have been reduced to points that define the elevation of the top of each hydrogeologic units at x,y locations; these point data, stored in a GIS feature class named “ModelInputData”, serve as digital input to the framework models. The location of wells used a sources of subsurface stratigraphic and lithologic information are stored within the GIS feature class “ModelInputData”, but are also provided as separate point feature classes in the geospatial database. Faults that offset hydrogeologic units are provided as a separate line feature class. Borehole data are also released as a set of tables, each of which may be joined or related to well location through a unique well identifier present in each table. Tables are in Excel and ascii comma-separated value (CSV) format and include separate but related tables for well location, stratigraphic information of the depths to top and base of hydrogeologic units intercepted downhole, downhole lithologic information reported at 10-foot intervals, and information on how lithologic descriptors were classed as sediment texture. Two types of geologic frameworks were constructed and released within a GIS feature dataset called “ModelGrids”: a hydrostratigraphic framework where the elevation, thickness, and spatial extent of the nine hydrogeologic units were defined based on interpolation of the input data, and (2) a textural model for each hydrogeologic unit based on interpolation of classed downhole lithologic data. Each framework is stored as an array of polygonal cells: essentially a “flattened”, two-dimensional representation of a digital 3D geologic framework. The elevation and thickness of the hydrogeologic units are contained within a single polygon feature class SVGF_3DHFM, which contains a mesh of polygons that represent model cells that have multiple attributes including XY location, elevation and thickness of each hydrogeologic unit. Textural information for each hydrogeologic unit are stored in a second array of polygonal cells called SVGF_TextureModel. The spatial data are accompanied by non-spatial tables that describe the sources of geologic information, a glossary of terms, a description of model units that describes the nine hydrogeologic units modeled in this study. A data dictionary defines the structure of the dataset, defines all fields in all spatial data attributer tables and all columns in all nonspatial tables, and duplicates the Entity and Attribute information contained in the metadata file. Spatial data are also presented as shapefiles. Downhole data from boreholes are released as a set of tables related by a unique well identifier, tables are in Excel and ascii comma-separated value (CSV) format.
Prescription Drugs Introduced to Market
data.ca.gov
data.chhs.ca.gov
+2more
csv, xlsx, zip
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Access and Information (2025). Prescription Drugs Introduced to Market [Dataset]. https://data.ca.gov/dataset/prescription-drugs-introduced-to-market
Explore at:
xlsx, csv, zipAvailable download formats
Dataset updated
Mar 12, 2025
Dataset authored and provided by
Department of Health Care Access and Information
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides data for new prescription drugs introduced to market in California with a Wholesale Acquisition Cost (WAC) that exceeds the Medicare Part D specialty drug cost threshold. Prescription drug manufacturers submit information to HCAI within a specified time period after a drug is introduced to market. Key data elements include the National Drug Code (NDC) administered by the FDA, a narrative description of marketing and pricing plans, and WAC, among other information. Manufacturers may withhold information that is not in the public domain. Note that prescription drug manufacturers are able to submit new drug reports for a prior quarter at any time. Therefore, the data set may include additional new drug report(s) from previous quarter(s).

There are two types of New Drug data sets: Monthly and Annual. The Monthly data sets include the data in completed reports submitted by manufacturers for calendar year 2025, as of March 12, 2025. The Annual data sets include data in completed reports submitted by manufacturers for the specified year. The data sets may include reports that do not meet the specified minimum thresholds for reporting.

The program regulations are available here: https://hcai.ca.gov/wp-content/uploads/2024/03/CTRx-Regulations-Text.pdf

The data format and file specifications are available here: https://hcai.ca.gov/wp-content/uploads/2024/03/Format-and-File-Specifications-version-2.0-ada.pdf

DATA NOTES: Due to recent changes in Excel capabilities, it is not recommended that you save these files to .csv format. If you do, when importing back into Excel the leading zeros in the NDC number column will be dropped. If you need to save it into a different format other than .xlsx it must be .txt

Facebook

Twitter

Click to copy link

Link copied

Cite

Manqiu Huang (2025). Renewable Energy Dataset and Load Demand Dataset from California ISO [Dataset]. http://doi.org/10.21227/b2cc-x995

Renewable Energy Dataset and Load Demand Dataset from California ISO

Explore at:

Unique identifier

https://doi.org/10.21227/b2cc-x995

Dataset updated

Mar 8, 2025

Dataset provided by

IEEE Dataport

Authors

Manqiu Huang

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

All three datasets are in CSV file format and contain renewable energy and load demand data from August 1, 2024 to August 30, 2024. The dataset is sourced from California ISO and is stored through an open-source repository https://github.com/gridstatus/gridstatus. Renewable energy sources include wind and solar energy, and the load demand is based on data from the Trinity Public Utility District (TIDC) and the Trook Irrigation District (TPWR). The key data in the dataset are: the "MW" column represents power measured in megawatts, "OPR_DT" represents the operation date in YYYY-MM-DD format, and "OPR_HR" represents the 24 hours within a day (1-24). The datasets provide high-resolution temporal information on renewable energy generation and variations in load demand, which are valuable for analyzing grid stability and demand forecasting in electrical grid system.

Clear search

Close search

Google apps

Main menu

Renewable Energy Dataset and Load Demand Dataset from California ISO

Data pipeline Validation And Load Testing using Multiple CSV Files

Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

Company Datasets for Business Profiling

final star tracker input csv

Dataset

Contents

conversations-dataset

Load the dataset

Industrial Park Management Bureau of the Ministry of Economic...

Dataset on the Human Body as a Signal Propagation Medium

Thyroid_Dataset_Input_CSV_File

Dataset

Contents

KU-MG2: A Dataset for Hybrid Photovoltaic-Natural Gas Generator Microgrid...

Data from: Sigfox and LoRaWAN Datasets for Fingerprint Localization in Large...

TMS daily traffic counts CSV

Dataset

Purchase Order Data

Repository Analytics and Metrics Portal (RAMP) 2019 data

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Data from: BuildingsBench: A Large-Scale Dataset of 900K Buildings and...

VisRAG-Ret-Test-ChartQA

Digital data for the Salinas Valley Geological Framework, California

Prescription Drugs Introduced to Market

Renewable Energy Dataset and Load Demand Dataset from California ISO