This dataset was created by John, Kim
This dataset was created by Ahmad Basher
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Sica Chang
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The present dataset combines the details of hourly load variation with the hourly weather parameters. The load dataset has been obtained from one of the substations (location Ahmedabad) and weather parameters dataset has been extracted from the NASA open-source website.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains historical data used for forecasting the electrical load of the national grid in Sri Lanka. The data spans from January 2020 to May 2025, with 189888 records representing electrical consumption in 15-minute intervals. This dataset is designed for use in predictive modeling, specifically for applying machine learning techniques such as Recurrent Neural Networks (RNNs) for short-term load forecasting.
Features: 1. Timestamp: Date and time of the observation (15-minute intervals).
2.Load Demand (kW): The amount of electrical load demand in kilowatts (kW), which is the target variable for forecasting.
3.Temperature (°C): Average temperature (in Celsius) during the forecast period.
4.Humidity (%): Average relative humidity during the forecast period.
5.Wind Speed (m/s): Average wind speed during the forecast period.
6.Rainfall (mm): Total rainfall (in millimeters) during the forecast period.
7.Solar Irradiance (W/m²): Solar energy received in watts per square meter, which affects electricity demand.
8.GDP (USD): The gross domestic product (GDP) in USD, representing economic activity that influences power demand.
9.Per Capita Energy Use (kWh): Average energy usage per person (in kilowatt-hours).
10.Electricity Price (LKR/kWh): Price of electricity (in LKR per kWh) during the time period.
11.Day of Week: The day of the week (0 = Monday, 6 = Sunday).
12.Hour of Day: The hour of the day (0 = midnight, 23 = 11 PM).
13.Month: The month of the year (1 = January, 12 = December).
14.Season: The season of the year (Summer, Winter, Fall).
15.Public Event: A binary variable indicating if a public event occurred (1 = Yes, 0 = No).
Purpose: The dataset can be used to train and evaluate models that predict electrical load demand for short-term forecasting. This is especially important for energy management, efficient resource allocation, and planning for future power generation in response to demand fluctuations.
Use Case:
This dataset is suitable for:
Predictive modeling using machine learning algorithms (particularly deep learning techniques like RNNs and LSTMs). Load forecasting and demand-side management.
Economic planning related to electricity generation and transmission.
Optimization of resource usage in energy management systems.
Data Collection:
The data was collected from multiple sources including the Ceylon Electricity Board (CEB) and weather data providers, with a focus on historical load demand and environmental factors that influence power consumption.
Licensing:
This dataset is open for educational and research purposes under a Creative Commons license. Please attribute the dataset to the source when using it for analysis or research.
This dataset was created by Sandiago
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Robin
Released under Apache 2.0
This dataset is generated for the purpose of analyzing furniture sales data using multiple regression techniques. It contains 2,500 rows with 15 columns, including 7 numerical columns and 7 categorical columns, along with a target variable (revenue) which represents the total revenue generated from furniture sales. The dataset captures various aspects of furniture sales, such as pricing, cost, sales volume, discount percentage, inventory levels, delivery time, and different categorical attributes like furniture type, material, color, and store location.
Guys please upload your notebook of this dataset so that others can also learn from your work
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2352583%2F868a18fb09d7a1d3da946d74a9857130%2FLogo.PNG?generation=1604973725053566&alt=media" alt="">
Medical Dataset for Abbreviation Disambiguation for Natural Language Understanding (MeDAL) is a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. It was published at the ClinicalNLP workshop at EMNLP.
💻 Code 🤗 Dataset (Hugging Face) 💾 Dataset (Kaggle) 💽 Dataset (Zenodo) 📜 Paper (ACL) 📝 Paper (Arxiv) ⚡ Pre-trained ELECTRA (Hugging Face)
We recommend downloading from Kaggle if you can authenticate through their API. The advantage to Kaggle is that the data is compressed, so it will be faster to download. Links to the data can be found at the top of the readme.
First, you will need to create an account on kaggle.com. Afterwards, you will need to install the kaggle API:
pip install kaggle
Then, you will need to follow the instructions here to add your username and key. Once that's done, you can run:
kaggle datasets download xhlulu/medal-emnlp
Now, unzip everything and place them inside the data
directory:
unzip -nq crawl-300d-2M-subword.zip -d data
mv data/pretrain_sample/* data/
For the LSTM models, we will need to use the fastText embeddings. To do so, first download and extract the weights:
wget -nc -P data/ https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip
unzip -nq data/crawl-300d-2M-subword.zip -d data/
You can directly load LSTM and LSTM-SA with torch.hub
:
```python
import torch
lstm = torch.hub.load("BruceWen120/medal", "lstm") lstm_sa = torch.hub.load("BruceWen120/medal", "lstm_sa") ```
If you want to use the Electra model, you need to first install transformers:
pip install transformers
Then, you can load it with torch.hub
:
python
import torch
electra = torch.hub.load("BruceWen120/medal", "electra")
transformers
If you are only interested in the pre-trained ELECTRA weights (without the disambiguation head), you can load it directly from the Hugging Face Repository:
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("xhlu/electra-medal")
tokenizer = AutoTokenizer.from_pretrained("xhlu/electra-medal")
Download the bibtex
here, or copy the text below:
@inproceedings{wen-etal-2020-medal,
title = "{M}e{DAL}: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining",
author = "Wen, Zhi and Lu, Xing Han and Reddy, Siva",
booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.15",
pages = "130--135",
}
The ELECTRA model is licensed under Apache 2.0. The license for the libraries used in this project (transformers
, pytorch
, etc.) can be found in their respective GitHub repository. Our model is released under a MIT license.
The original dataset was retrieved and modified from the NLM website. By using this dataset, you are bound by the terms and conditions specified by NLM:
INTRODUCTION
Downloading data from the National Library of Medicine FTP servers indicates your acceptance of the following Terms and Conditions: No charges, usage fees or royalties are paid to NLM for this data.
MEDLINE/PUBMED SPECIFIC TERMS
NLM freely provides PubMed/MEDLINE data. Please note some PubMed/MEDLINE abstracts may be protected by copyright.
GENERAL TERMS AND CONDITIONS
Users of the data agree to:
- acknowledge NLM as the source of the data by including the phrase "Courtesy of the U.S. National Library of Medicine" in a clear and conspicuous manner,
- properly use registration and/or trademark symbols when referring to NLM products, and
- not indicate or imply that NLM has endorsed its products/services/applications.
Users who republish or redistribute the data (services, products or raw data) agree to:
- maintain the most current version of all distributed data, or
- make known in a clear and conspicuous manner that the products/services/applications do not reflect the most current/accurate data available from NLM.
These data are produced with a reasonable standard of care, but NLM makes no warranties express or implied, including no warranty of merchantability or fitness for particular purpose, regarding the accuracy or completeness of the data. Users agree to hold NLM and the U.S. Government harmless from any liability resulting from errors in the data. NLM disclaims any liability for any consequences due to use, misuse, or interpretation of information contained or not contained in the data.
NLM does not provide legal advice regarding copyright, fair use, or other aspects of intellectual property rights. See the NLM Copyright page.
NLM reserves the right to change the type and format of its machine-readable data. NLM will take reasonable steps to inform users of any changes to the format of the data before the data are distributed via the announcement section or subscription to email and RSS updates.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Soyabul Islam Lincoln (samlin)
Released under Apache 2.0
This dataset was created by jeongjh180
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Imports:
# All Imports
import os
from matplotlib import pyplot as plt
import pandas as pd
from sklearn.calibration import LabelEncoder
import seaborn as sns
import matplotlib.image as mpimg
import cv2
import numpy as np
import pickle
# Tensflor and Keras Layer and Model and Optimize and Loss
import tensorflow as tf
from tensorflow import keras
from keras import Sequential
from keras.layers import *
#Kernel Intilizer
from keras.optimizers import Adamax
# PreTrained Model
from keras.applications import *
#Early Stopping
from keras.callbacks import EarlyStopping
import warnings
Warnings Suppression | Configuration
# Warnings Remove
warnings.filterwarnings("ignore")
# Define the base path for the training folder
base_path = 'jaguar_cheetah/train'
# Weights file
weights_file = 'Model_train_weights.weights.h5'
# Path to the saved or to save the model:
model_file = 'Model-cheetah_jaguar_Treined.keras'
# Model history
history_path = 'training_history_cheetah_jaguar.pkl'
# Initialize lists to store file paths and labels
filepaths = []
labels = []
# Iterate over folders and files within the training directory
for folder in ['Cheetah', 'Jaguar']:
folder_path = os.path.join(base_path, folder)
for filename in os.listdir(folder_path):
file_path = os.path.join(folder_path, filename)
filepaths.append(file_path)
labels.append(folder)
# Create the TRAINING dataframe
file_path_series = pd.Series(filepaths , name= 'filepath')
Label_path_series = pd.Series(labels , name = 'label')
df_train = pd.concat([file_path_series ,Label_path_series ] , axis = 1)
# Define the base path for the test folder
directory = "jaguar_cheetah/test"
filepath =[]
label = []
folds = os.listdir(directory)
for fold in folds:
f_path = os.path.join(directory , fold)
imgs = os.listdir(f_path)
for img in imgs:
img_path = os.path.join(f_path , img)
filepath.append(img_path)
label.append(fold)
# Create the TEST dataframe
file_path_series = pd.Series(filepath , name= 'filepath')
Label_path_series = pd.Series(label , name = 'label')
df_test = pd.concat([file_path_series ,Label_path_series ] , axis = 1)
# Display the first rows of the dataframe for verification
#print(df_train)
# Folders with Training and Test files
data_dir = 'jaguar_cheetah/train'
test_dir = 'jaguar_cheetah/test'
# Image size 256x256
IMAGE_SIZE = (256,256)
Tain | Test
#print('Training Images:')
# Create the TRAIN dataframe
train_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.1,
subset='training',
seed=123,
image_size=IMAGE_SIZE,
batch_size=32)
#Testing Data
#print('Validation Images:')
validation_ds = tf.keras.utils.image_dataset_from_directory(
data_dir,
validation_split=0.1,
subset='validation',
seed=123,
image_size=IMAGE_SIZE,
batch_size=32)
print('Testing Images:')
test_ds = tf.keras.utils.image_dataset_from_directory(
test_dir,
seed=123,
image_size=IMAGE_SIZE,
batch_size=32)
# Extract labels
train_labels = train_ds.class_names
test_labels = test_ds.class_names
validation_labels = validation_ds.class_names
# Encode labels
# Defining the class labels
class_labels = ['CHEETAH', 'JAGUAR']
# Instantiate (encoder) LabelEncoder
label_encoder = LabelEncoder()
# Fit the label encoder on the class labels
label_encoder.fit(class_labels)
# Transform the labels for the training dataset
train_labels_encoded = label_encoder.transform(train_labels)
# Transform the labels for the validation dataset
validation_labels_encoded = label_encoder.transform(validation_labels)
# Transform the labels for the testing dataset
test_labels_encoded = label_encoder.transform(test_labels)
# Normalize the pixel values
# Train files
train_ds = train_ds.map(lambda x, y: (x / 255.0, y))
# Validate files
validation_ds = validation_ds.map(lambda x, y: (x / 255.0, y))
# Test files
test_ds = test_ds.map(lambda x, y: (x / 255.0, y))
#TRAINING VISUALIZATION
#Count the occurrences of each category in the column
count = df_train['label'].value_counts()
# Create a figure with 2 subplots
fig, axs = plt.subplots(1, 2, figsize=(12, 6), facecolor='white')
# Plot a pie chart on the first subplot
palette = sns.color_palette("viridis")
sns.set_palette(palette)
axs[0].pie(count, labels=count.index, autopct='%1.1f%%', startangle=140)
axs[0].set_title('Distribution of Training Categories')
# Plot a bar chart on the second subplot
sns.barplot(x=count.index, y=count.values, ax=axs[1], palette="viridis")
axs[1].set_title('Count of Training Categories')
# Adjust the layout
plt.tight_layout()
# Visualize
plt.show()
# TEST VISUALIZATION
count = df_test['label'].value_counts()
# Create a figure with 2 subplots
fig, axs = plt.subplots(1, 2, figsize=(12, 6), facec...
This dataset was created by Felix Klein
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The EV Charging Dataset, used in this study, is a publicly available dataset on Kaggle that records real-world electric vehicle (EV) charging behavior and patterns across various locations. The dataset contains 26 key features, each providing valuable insights into the operational and environmental factors that influence EV charging loads. The features include vehicle-specific data, charging station details, and environmental metrics, which collectively contribute to a comprehensive understanding of the factors affecting EV charging demands and route optimization.
Vehicle ID: A unique identifier for each electric vehicle in the dataset, used for tracking individual vehicle charging behavior. Battery Capacity (kWh): The total energy storage capacity of the EV battery, typically measured in kilowatt-hours. State of Charge (SOC %): The current charge level of the vehicle's battery as a percentage of its total capacity. Energy Consumption Rate (kWh/km): The rate at which the vehicle consumes energy per kilometer traveled, modeled based on real-world driving conditions. Current and Destination Latitude/Longitude: Geographic coordinates providing the vehicle's current and intended location. Distance to Destination (km): The remaining distance to the vehicle’s destination, which influences the decision-making process for when to charge. Traffic Data: A count of vehicles on the road, providing insight into real-time congestion levels affecting the travel duration and energy consumption. Road Conditions: A categorical feature (Good, Average, Poor) representing the state of the road, which can impact vehicle energy efficiency. Charging Station ID: A unique identifier for each charging station where the vehicle connects for recharging. Charging Rate (kW): The rate at which power is delivered to the vehicle’s battery while charging, influencing the time required to fully charge. Queue Time (mins): The estimated waiting time before charging starts, influenced by the number of vehicles at the station. Station Capacity (EVs): The maximum number of vehicles a charging station can accommodate simultaneously. Time Spent Charging (mins): The duration for which a vehicle is connected to the charging station. Energy Drawn (kWh): The amount of energy transferred to the vehicle's battery during the charging session. Session Start Hour: The hour of the day when the charging session begins, represented as an integer from 0 to 23. Fleet Size: The total number of vehicles in the fleet, which provides insights into overall charging demand. Fleet Schedule: Indicates whether the fleet is on schedule or delayed (0 for on time, 1 for delayed). Temperature (°C), Wind Speed (m/s), and Precipitation (mm): Environmental variables that affect EV performance and energy usage during travel. Weekday: Coded as an integer from 0 to 6, representing the day of the week. Charging Preferences: A binary variable indicating whether a vehicle or user has any specific preferences for charging stations (0 for no preference, 1 for preference). Weather Conditions: The overall weather status (Clear, Cloudy, Rain, Storm), which influences travel and charging behavior. Charging Load (kW): The target label representing the load on the charging station, used for forecasting and demand prediction. This dataset is essential for the development of machine learning models aimed at predicting EV charging demand and optimizing charging infrastructure usage. By analyzing the features provided, the dataset enables researchers to investigate patterns in EV charging behavior and explore route optimization strategies in the context of IoT-enabled electric vehicle networks.
Location Dataset Description: The Location Dataset is a synthetic dataset designed for route optimization tasks, especially useful for logistics, fleet management, and EV route planning applications. The dataset consists of 30 key locations, each represented by its geographical coordinates and categorized based on its function (e.g., city, port, warehouse). This dataset allows for the computation of the optimal routes between locations using various optimization algorithms.
Location: A unique identifier for each point in the dataset, typically named after a city or functional node (e.g., A, B, C). Type: The type of location, which indicates its role in the network. Types include: City: Represents urban areas where fleet operations typically begin or end. Port: Represents seaports or inland ports where goods are transferred between modes of transport. Warehouse: Represents storage facilities that act as distribution points. Power Plant: Represents energy generation sites, often used in energy logistics planning. Industrial Zone: Represents areas designated for manufacturing and other industrial operations. Mining Site: Represents remote locations where resources are extracted. Latitude: The geographic coor...
Dataset Card for Dataset Name
This is a FiftyOne dataset with 100 samples.
Installation
If you haven't already, install FiftyOne: pip install -U fiftyone
Usage
import fiftyone as fo from fiftyone.utils.huggingface import load_from_hub
dataset = load_from_hub("vickruto/cardd100-kaggle")
session = fo.launch_app(dataset)
Dataset Details… See the full description on the dataset page: https://huggingface.co/datasets/vickruto/cardd100-kaggle.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 3,234 data of reinforced concrete (RC) beam design parameters and their corresponding load-bearing capacities. The data is based on realistic construction standards and includes geometric, material, and reinforcement details such as beam dimensions, concrete grade, reinforcement ratios, and stirrup specifications.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides data for analyzing and optimizing electricity load management in modern power systems. It captures key variables related to elastic load behavior, dynamic pricing, shared energy storage systems (SESS), power-to-gas (P2G) technology, and carbon emissions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains EEG-derived features, biometric indicators, and session metadata collected from 124 college students during English reading comprehension tasks. Each of the 1310 rows represents a reading segment annotated with cognitive load levels (Low, Medium, High). Features include power spectral densities across EEG bands (Delta, Theta, Alpha, Beta, Gamma), mental effort scores, signal entropy, and biometric markers such as heart rate variability and pupil dilation. Additional metadata such as student age, gender, and English proficiency level are included to support demographic analysis.
This dataset was created by John, Kim