Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We use the "Individual Household Electric Power Consumption" dataset from the UCI Machine Learning Repository. It captures minute-level measurements from a single household in France over nearly four years (December 2006 - November 2010). It captures minute-level measurements from a single household in France over almost four years (December 2006 - November 2010). This dataset is ideal due to: Its high-resolution time-series nature (1-minute intervals) Presence of seasonal patterns (daily, weekly cycles) Availability of multiple correlated features Real-world irregularities like missing values
Let's dive into understanding the data and its features Total Records: ~2,075,259 Time Range: December 2006 - November 2010 Sampling Frequency: 1 minute Key Features: Date, Time – Timestamps Global_active_power (kW) – Target variable Global_reactive_power, Voltage, Global_intensity Sub_metering_1/2/3 – Appliance-level consumption
Forecasting Target and Granularity For practical modeling and scalability: Target variable: Global_active_power (total active household consumption in kilowatts) Granularity: Resampled to hourly frequency to reduce noise and computational load Prediction task: Forecast the next 24 hours of electricity demand
This setup mimics real-world energy forecasting use cases like load balancing or smart home optimization. Data Challenges Some important preprocessing considerations: Missing values: Approximately 1.25% of values are missing - will be handled via imputation Time gaps: Some timestamps are irregular and need reindexing Chronos format: Requires CSVs with predefined schemas and Amazon S3 upload
Experimental Setup Train/Test Split: Train: First 80% of the data (chronologically) Test: Last 20% for out-of-sample evaluation
Facebook
TwitterThis dataset originates from the Amazon ML Challenge 2025, where the goal is to predict the price of products based on their textual content and image information. Each product listing contains a catalog_content field describing the product (similar to a combination of title, description, and specifications) and a corresponding product image.
The dataset is divided into training along with sample test and output files to guide prediction formatting.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
1. Tropical forests are subject to diverse deforestation pressures while their conservation is essential to achieve global climate goals. Predicting the location of deforestation is challenging due to the complexity of the natural and human systems involved but accurate and timely forecasts could enable effective planning and on-the-ground enforcement practices to curb deforestation rates. New computer vision technologies based on deep learning can be applied to the increasing volume of Earth observation data to generate novel insights and make predictions with unprecedented accuracy.
2. Here, we demonstrate the ability of deep convolutional neural networks (CNNs) to learn spatiotemporal patterns of deforestation from a limited set of freely available global data layers, including multispectral satellite imagery, the Hansen maps of annual forest change (2001-2020) and the ALOS PALSAR digital surface model, to forecast deforestation (2021). We designed four model architectures, based on 2D CNNs, 3D CNNs, and Convolutional Long Short-Term Memory (ConvLSTM) Recurrent Neural Networks (RNNs), to produce spatial maps that indicate the risk to each forested pixel (~30 m) in the landscape of becoming deforested within the next year. They were trained and tested on data from two ~80,000 km2 tropical forest regions in the Southern Peruvian Amazon.
3. The networks could predict the location of future forest loss to a high degree of accuracy (F1 = 0.58-0.71). Our best performing model (3D CNN) had the highest pixel-wise accuracy (F1 = 0.71) when validated on 2020 forest loss (2014-2019 training). Visual interpretation of the mapped forecasts indicated that the network could automatically discern the drivers of forest loss from the input data. For example, pixels around new access routes (e.g. roads) were assigned high risk whereas this was not the case for recent, concentrated natural loss events (e.g. remote landslides).
4. CNNs can harness limited time-series data to predict near-future deforestation patterns, an important step in harnessing the growing volume of satellite remote sensing data to curb global deforestation. The modelling framework can be readily applied to any tropical forest location and used by governments and conservation organisations to prevent deforestation and plan protected areas.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
Utilize our Amazon reviews dataset for diverse applications to enrich business strategies and market insights. Analyzing this dataset can aid in understanding customer behavior, product performance, and market trends, empowering organizations to refine their product and marketing strategies. Access the entire dataset or tailor a subset to fit your requirements. Popular use cases include: Product Performance Analysis: Analyze Amazon reviews to assess product performance, uncovering customer satisfaction levels, common issues, and highly praised features to inform product improvements and marketing messages. Customer Behavior Insights: Gain insights into customer behavior, purchasing patterns, and preferences, enabling more personalized marketing and product recommendations. Demand Forecasting: Leverage Amazon reviews to predict future product demand by analyzing historical review data and identifying trends, helping to optimize inventory management and sales strategies. Accessing and analyzing the Amazon reviews dataset supports market strategy optimization by leveraging insights to analyze key market trends and customer preferences, enhancing overall business decision-making.
Facebook
TwitterNOTE - Upgrade NCEP Global Forecast System to v16.3.0 - Effective November 29, 2022 See notification HERE
The Global Forecast System (GFS) is a weather forecast model produced
by the National Centers for Environmental Prediction (NCEP). Dozens of
atmospheric and land-soil variables are available through this dataset,
from temperatures, winds, and precipitation to soil moisture and
atmospheric ozone concentration. The entire globe is covered by the GFS
at a base horizontal resolution of 18 miles (28 kilometers) between grid
points, which is used by the operational forecasters who predict weather
out to 16 days in the future. Horizontal resolution drops to 44 miles
(70 kilometers) between grid point for forecasts between one week and two
weeks.
The NOAA Global Forecast Systems (GFS) Warm Start Initial Conditions are
produced by the National Centers for Environmental Prediction Center (NCEP)
to run operational deterministic medium-range numerical weather predictions.
The GFS is built with the GFDL Finite-Volume Cubed-Sphere Dynamical Core (FV3)
and the Grid-Point Statistical Interpolation (GSI) data assimilation system.
Please visit the links below in the Documentation section to find more details
about the model and the data assimilation systems. The current operational
GFS is run at 64 layers in the vertical extending from the surface to the upper
stratosphere and on six cubic-sphere tiles at the C768 or 13-km horizontal
resolution. A new version of the GFS that has 127 layers extending to the
mesopause will be implemented for operation on February 3, 2021. These initial
conditions are made available four times per day for running forecasts at the
00Z, 06Z, 12Z and 18Z cycles, respectively. For each cycle, the dataset
contains the first guess of the atmosphere states found in the directory
./gdas.yyyymmdd/hh-6/RESTART, which are 6-hour GDAS forecast from the last
cycle, and atmospheric analysis increments and surface analysis for the current
cycle found in the directory ./gfs.yyyymmdd/hh, which are produced by the data
assimilation systems.
Facebook
TwitterThis data set contains meteorological data collected around the confluence of the Tapajos River with the Amazon River in the Amazon Basin near Santarem, Brazil, in July and August 2001. Boundary layer and upper air measurements were collected with an acoustic sounder-sodar instrument, pilot balloons with optical theodolites, and radiosondes. Radiosondes also measured pressure, temperature, and relative humidity in addition to wind speed and direction. Measurements were made from five local stations at varying frequencies. There are 41 comma-delimited data files with this data set. Supporting information provided with the data set as companion files include: Weather forecasts: Weather forecasts were used to determine the presence of favorable conditions for the balloon flights during the CIRSAN experiment, as well as to help decide the radiosonde launch frequency. The daily observed and forecast weather descriptions for the study period (Weather_forecasts_Santarem.txt) are included. Satellite images: All the satellite images during the CIRSAN period are provided. This is a compilation of images from various instruments and satellite platforms. (See readme_sat.txt). There are 42 images in .gif format. CPTEC Analysis files: The CIRSAN measurement data were used in the CPTEC Global Analysis modeling activity. Model output results for the Pacific and South American region are provided in GRIB format. (See readme_GPSA.txt)
Facebook
TwitterThe Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The GFS data files stored here can be immediately used for OAR/ARL’s NOAA-EPA Atmosphere-Chemistry Coupler Cloud (NACC-Cloud) tool, and are in a Network Common Data Form (netCDF), which is a very common format used across the scientific community. These particular GFS files contain a comprehensive number of global atmosphere/land variables at a relatively high spatiotemporal resolution (approximately 13x13 km horizontal, vertical resolution of 127 levels, and hourly), are not only necessary for the NACC-Cloud tool to adequately drive community air quality applications (e.g., U.S. EPA’s Community Multiscale Air Quality model; https://www.epa.gov/cmaq), but can be very useful for a myriad of other applications in the Earth system modeling communities (e.g., atmosphere, hydrosphere, pedosphere, etc.). While many other data file and record formats are indeed available for Earth system and climate research (e.g., GRIB, HDF, GeoTIFF), the netCDF files here are advantageous to the larger community because of the comprehensive, high spatiotemporal information they contain, and because they are more scalable, appendable, shareable, self-describing, and community-friendly (i.e., many tools available to the community of users). Out of the four operational GFS forecast cycles per day (at 00Z, 06Z, 12Z and 18Z) this particular netCDF dataset is updated daily (/inputs/yyyymmdd/) for the 12Z cycle and includes 24-hr output for both 2D (gfs.t12z.sfcf$0hh.nc) and 3D variables (gfs.t12z.atmf$0hh.nc).
Also available are netCDF formatted Global Land Surface Datasets (GLSDs) developed by Hung et al. (2024). The GLSDs are based on numerous satellite products, and have been gridded to match the GFS spatial resolution (~13x13 km). These GLSDs contain vegetation canopy data (e.g., land surface type, vegetation clumping index, leaf area index, vegetative canopy height, and green vegetation fraction) that are supplemental to and can be combined with the GFS meteorological netCDF data for various applications, including NOAA-ARL's canopy-app. The canopy data variables are climatological, based on satellite data from the year 2020, combined with GFS meteorology for the year 2022, and are created at a daily temporal resolution (/inputs/geo-files/gfs.canopy.t12z.2022mmdd.sfcf000.global.nc)
Facebook
TwitterThis data set contains meteorological data collected around the confluence of the Tapajos River with the Amazon River in the Amazon Basin near Santarem, Brazil, in July and August 2001. Boundary layer and upper air measurements were collected with an acoustic sounder-sodar instrument, pilot balloons with optical theodolites, and radiosondes. Radiosondes also measured pressure, temperature, and relative humidity in addition to wind speed and direction. Measurements were made from five local stations at varying frequencies. There are 41 comma-delimited data files with this data set.
Supporting information provided with the data set as companion files include:
Weather forecasts: Weather forecasts were used to determine the presence of favorable conditions for the balloon flights during the CIRSAN experiment, as well as to help decide the radiosonde launch frequency. The daily observed and forecast weather descriptions for the study period (Weather_forecasts_Santarem.txt) are included.
Satellite images: All the satellite images during the CIRSAN period are provided. This is a compilation of images from various instruments and satellite platforms. (See readme_sat.txt). There are 42 images in .gif format.
CPTEC Analysis files: The CIRSAN measurement data were used in the CPTEC Global Analysis modeling activity. Model output results for the Pacific and South American region are provided in GRIB format. (See readme_GPSA.txt)
Facebook
TwitterThis data set contains forecast products from the NCEP/NCAR Reanalysis Project. The resolution of the Reanalysis Forecast Model is T62 (209 km) with 28 vertical sigma levels. Every five days, beginning with 00Z 01 January, an 8 day forecast run is made. The initial time and outputs for every 12 hours are saved from this run. Pressure level and special level data are archived on the 2.5 by 2.5 latitude-longitude grid and flux fields are archived on the 192 by 94 Gaussian grid. Details on variables included are available elsewhere. Some special periods will be analyzed more than once to provide data for special research studies. For example, a special run of 1979 was made excluding most satellite inputs. This run could be used for evaluating the impact of satellite data on the forecasts. See DS090.0 [https://rda.ucar.edu/datasets/ds090.0/]for analysis and first guess fields. All data files for January 1948 to October 2005 are available as of 2005DEC29.
Facebook
TwitterThe HRRR is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation. Radar data is assimilated in the HRRR every 15 min over a 1-h period adding further detail to that provided by the hourly data assimilation from the 13km radar-enhanced Rapid Refresh.
The HRRR ZARR formatted data was originally generated by the University of Utah under a grant provided by NOAA. They are are continuing to publish ZARR versions of HRRR data. For information about data in the s3://hrrrzarr/ please contact atmos-mesowest@lists.utah.edu.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Important Notice: Update of JRA-55 data will terminate at the end of January 2024. Please use Japanese Reanalysis for Three Quarters of a Century (JRA-3Q) [https://rda.ucar.edu/datasets/d640000/] at that time.
The Japan Meteorological Agency (JMA) conducted JRA-55, the second Japanese global atmospheric reanalysis project. It covers 55 years, extending back to 1958, coinciding with the establishment of the global radiosonde observing system. Compared to its predecessor, JRA-25, JRA-55 is based on a new data assimilation and prediction system (DA) that improves many deficiencies found in the first Japanese reanalysis. These improvements have come about by implementing higher spatial resolution (TL319L60), a new radiation scheme, four-dimensional variational data assimilation (4D-Var) with Variational Bias Correction (VarBC) for satellite radiances, and introduction of greenhouse gases with time varying concentrations. The entire JRA-55 production was completed in 2013, and thereafter will be continued on a real time basis.
Specific early results of quality assessment of JRA-55 indicate that a large temperature bias in the lower stratosphere has been significantly reduced compared to JRA-25 through a combination of the new radiation scheme and application of VarBC (which also reduces unrealistic temperature variations). In addition, a dry land surface anomaly in the Amazon basin has been mitigated, and overall forecast scores are much improved over JRA-25.
Most of the observational data employed in JRA-55 are those used in JRA-25. Additionally, newly reprocessed METEOSAT and GMS data were supplied by EUMETSAT and MSC/JMA respectively. Snow depth data over the United States, Russia and Mongolia were supplied by UCAR, RIHMI and IMH respectively.
The Data Support Section (DSS) at NCAR has processed the 1.25 degree version of JRA-55 with the RDA (Research Data Archive) archiving and metadata system. The model resolution data has also been acquired, archived and processed as well, including transformation of the TL319L60 grid to a regular latitude-longitude Gaussian grid (320 latitudes by 640 longitudes, nominally 0.5625 degree). All RDA JRA-55 data is available for internet download, including complete subsetting and data format conversion services.
Facebook
TwitterThe National Centers for Environmental Prediction (NCEP) Climate Forecast System (CFS) is initialized four times per day (0000, 0600, 1200, and 1800 UTC). NCEP upgraded their operational CFS to version 2 on March 30, 2011. This is the same model that was used to create the NCEP Climate Forecast System Reanalysis (CFSR), and the purpose of this dataset is to extend CFSR. The 6-hourly atmospheric, oceanic and land surface analyzed products and forecasts, available at 0.2, 0.5, 1.0, and 2.5 degree horizontal resolutions, are archived here beginning with January 1, 2011 as an extension of CFSR. The RDA is not archiving any of the CFS seasonal forecasts. For more information about CFS, please see http://cfs.ncep.noaa.gov/ [http://cfs.ncep.noaa.gov/].
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Japan Meteorological Agency (JMA) conducted JRA-55, the second Japanese global atmospheric reanalysis project. It covers 55 years, extending back to 1958, coinciding with the establishment of the global radiosonde observing system. Compared to its predecessor, JRA-25, JRA-55 is based on a new data assimilation and prediction system (DA) that improves many deficiencies found in the first Japanese reanalysis. These improvements have come about by implementing higher spatial resolution (TL319L60), a new radiation scheme, four-dimensional variational data assimilation (4D-Var) with Variational Bias Correction (VarBC) for satellite radiances, and introduction of greenhouse gases with time varying concentrations. The entire JRA-55 production was completed in 2013, and thereafter will be continued on a real time basis.
Specific early results of quality assessment of JRA-55 indicate that a large temperature bias in the lower stratosphere has been significantly reduced compared to JRA-25 through a combination of the new radiation scheme and application of VarBC (which also reduces unrealistic temperature variations). In addition, a dry land surface anomaly in the Amazon basin has been mitigated, and overall forecast scores are much improved over JRA-25.
Most of the observational data employed in JRA-55 are those used in JRA-25. Additionally, newly reprocessed METEOSAT and GMS data were supplied by EUMETSAT and MSC/JMA respectively. Snow depth data over the United States, Russia and Mongolia were supplied by UCAR, RIHMI and IMH respectively.
The Data Support Section (DSS) at NCAR has processed the 1.25 degree version of JRA-55 with the RDA (Research Data Archive) archiving and metadata system. The model resolution data has also been acquired, archived and processed as well, including transformation of the TL319L60 grid to a regular latitude-longitude Gaussian grid (320 latitudes by 640 longitudes, nominally 0.5625 degree). All RDA JRA-55 data is available for internet download, including complete subsetting and data format conversion services.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Predictive analysis of mortality rate models.
Facebook
TwitterThis dataset contains 4 radiative tendency variables on the original 60 hybrid model levels on the reduced N80 [https://rda.ucar.edu/datasets/common/ecmwf/docs/n80Rpnts.html] Gaussian grid. The data are accumulated over the 3 and 6 hour forecasts for the 00, 06, 12 and 18 UTC analyses. The ECMWF Re-Analysis (ERA40) is a global atmospheric analysis of many conventional observations and satellite data streams for the period Sept,1957- Aug,2002. There are numerous data products that are separated into dataset series based on resolution, vertical coordinate reference, and likely research applications. Descriptions of the series organization and direct links to information about all ERA40 products are available [https://rda.ucar.educgi-bin/joey/era40sum.pl?ds=ds118.0]. The ERA-Interim data from ECMWF is an update to the ERA-40 project. The ERA-Interim data starts in 1989 and has a higher horizontal resolution (T255, N128 nominally 0.703125 degrees) than the ERA-40 data (T159, N80 nominally 1.125 degrees). ERA-Interim is based on a more current model than ERA-40 and uses 4-D VAR (as apposed to 3-D VAR in ERA-40). ECMWF will continue to run the ERA-Interim model in near real time thru at least 2010, and possibly longer. This data is available in ds627.0 [https://rda.ucar.edu/datasets/ds627.0/].
Facebook
TwitterThe Effective Atmospheric Angular Momentum (EAAM) is a combined project from the Met Office and the European Centre for Medium Range Weather Forecasting (ECMWF). The data is of 3 angular momentum components of the mass and wind terms at 12 or 24 hourly intervals. The ECMWF data are from 1979-93. The corresponding Met Office Unified Model data cover the period from 1983 to 1997. This dataset is public.
https://catalogue.ceda.ac.uk/uuid/bf626d5254cb9df807c3ffef170b2331
Facebook
TwitterSince 2 July 1993, the AMRC has been archiving the National Centers for Environmental Prediction's (formerly the National Meteorological Center) Medium Range Forecast Model analyses. Since 14 April 1994, the forecasts have also been archived. This collection can be very useful, as it is a smoothed data set over the data sparse Antarctic region. Data collection has stopped as of 31 August, 2015.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the historical stock price data for Amazon.com, Inc. (AMZN), one of the largest and most influential technology companies in the world. The data has been sourced directly from Yahoo Finance, a widely trusted provider of financial market data. It spans a significant time range, enabling users to analyze Amazon’s market performance over the years, observe long-term trends, and identify key events in the company’s history.
The dataset is structured as a CSV file, with each row representing a single trading day. The following columns are included:
This dataset is suitable for a wide range of financial, academic, and data science projects, such as:
Facebook
TwitterThe National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR) was initially completed over the 31-year period from 1979 to 2009 and has been extended to March 2011. CFSR was initialized 4 times per day (0000, 0600, 1200, and 1800 UTC), and the 6-hourly atmospheric, oceanic and land surface analyzed products are available at 0.3, 0.5, 1.0, 1.9, and 2.5 degree horizontal resolutions, along with forecast hours 1 through 6. However, not all parameters are available at all resolutions and some parameters are not analyzed (e.g. 2 meter temperature, 10 meter winds), so please consult the detailed metadata for exact descriptions of what is available. For more information about CFSR, please see: https://rda.ucar.edu#!pub/cfsr.html [https://rda.ucar.edu#!pub/cfsr.html]. For data that extend CFSR beyond December 2010, please see the RDA datasets for NCEP's Climate Forecast System Version 2 (CFSv2) [https://rda.ucar.eduindex.html#!lfd?nb=y&b=proj&v=CFSv2+>+NCEP+Climate+Forecast+System+Version+2].
Facebook
TwitterAircraft and rawinsonde (300, 250, and 200mb) observations over the tropical Pacific were analyzed to produce monthly grids of wind at approximately 250mb for the periods January 1966 to September 1968 and August 1970 to December 1973. The grid has a resolution of 2.5 degrees and runs from 30S to 45N latitude and 75E to 70W longitude.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We use the "Individual Household Electric Power Consumption" dataset from the UCI Machine Learning Repository. It captures minute-level measurements from a single household in France over nearly four years (December 2006 - November 2010). It captures minute-level measurements from a single household in France over almost four years (December 2006 - November 2010). This dataset is ideal due to: Its high-resolution time-series nature (1-minute intervals) Presence of seasonal patterns (daily, weekly cycles) Availability of multiple correlated features Real-world irregularities like missing values
Let's dive into understanding the data and its features Total Records: ~2,075,259 Time Range: December 2006 - November 2010 Sampling Frequency: 1 minute Key Features: Date, Time – Timestamps Global_active_power (kW) – Target variable Global_reactive_power, Voltage, Global_intensity Sub_metering_1/2/3 – Appliance-level consumption
Forecasting Target and Granularity For practical modeling and scalability: Target variable: Global_active_power (total active household consumption in kilowatts) Granularity: Resampled to hourly frequency to reduce noise and computational load Prediction task: Forecast the next 24 hours of electricity demand
This setup mimics real-world energy forecasting use cases like load balancing or smart home optimization. Data Challenges Some important preprocessing considerations: Missing values: Approximately 1.25% of values are missing - will be handled via imputation Time gaps: Some timestamps are irregular and need reindexing Chronos format: Requires CSVs with predefined schemas and Amazon S3 upload
Experimental Setup Train/Test Split: Train: First 80% of the data (chronologically) Test: Last 20% for out-of-sample evaluation