60 datasets found
  1. F

    S&P 500

    • fred.stlouisfed.org
    json
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). S&P 500 [Dataset]. https://fred.stlouisfed.org/series/SP500
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jul 11, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval

    Description

    View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.

  2. T

    China Shanghai Composite Stock Market Index Data

    • tradingeconomics.com
    • jp.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). China Shanghai Composite Stock Market Index Data [Dataset]. https://tradingeconomics.com/china/stock-market
    Explore at:
    xml, csv, excel, jsonAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 19, 1990 - Jul 14, 2025
    Area covered
    China
    Description

    China's main stock market index, the SHANGHAI, rose to 3520 points on July 14, 2025, gaining 0.27% from the previous session. Over the past month, the index has climbed 3.86% and is up 18.35% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from China. China Shanghai Composite Stock Market Index - values, historical data, forecasts and news - updated on July of 2025.

  3. Daily Summary of Sri Lanka's Weather

    • kaggle.com
    Updated Jan 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Daily Summary of Sri Lanka's Weather [Dataset]. https://www.kaggle.com/datasets/thedevastator/daily-summary-of-sri-lanka-s-precipitation-indic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 18, 2023
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Sri Lanka
    Description

    Daily Summary of Sri Lanka's Weather

    Five Years of Observations

    By Humanitarian Data Exchange [source]

    About this dataset

    This dataset contains five years of daily summaries of precipitation indicators in Sri Lanka. Data is compiled by the National Centers for Environmental Information (NCEI) in partnership with the United States government's National Oceanic and Atmospheric Administration (NOAA). These four indicators measure data collected from several stations across the country: Total Precipitation (TPCP), Maximum Snow Depth (MXSD), Total Snow Fall (TSNW), and Extreme Maximum Daily Precipitation (EMXP). Despite this dataset being comprehensive, it is important to recognize that due to late-arriving data, the number of recent records may be underestimated. Whether you are a researcher or climatologist, this dataset provides valuable insight into trends in Sri Lanka's weather patterns over the last five years

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • ๐Ÿšจ Your notebook can be here! ๐Ÿšจ!

    How to use the dataset

    This dataset contains the daily summaries on base stations across Sri Lanka for the past 5 years. It includes four indicators including: TPCP (Total Precipitation), MXSD (Max Snow Depth), TSNW (Total Snow Fall) and EMXP (Extreme Maximum Daily Precipitation). In this guide, we will show you how to use this dataset for your own purposes.

    Research Ideas

    • Analyzing the trend of maximum snow depth over the years in Sri Lanka using monthly, quarterly and yearly aggregations.
    • Estimating extreme maximum daily precipitation in different regions of Sri Lanka to understand the changing patterns over time.
    • Visualizing average total snowfall fields across various base stations and comparing these outcomes with climate simulations to identify potential climate change impacts on extreme weather events in Sri Lanka

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: precipitation-lka-csv-1.csv | Column name | Description | |:--------------|:--------------------------------------------------------------------| | date | Date when data was collected. (Date) | | datatype | Type of data that has been collected. (String) | | station | Location where data was recorded. (String) | | value | Measurement value for each indicator for each day. (Float) | | fl_miss | Tells if any observations are missing from that day. (Boolean) | | fl_cmiss | Tells whether all observations are complete. (Boolean) | | country | Country from where the observed values have been recorded. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Humanitarian Data Exchange.

  4. T

    Japan Stock Market Index (JP225) Data

    • tradingeconomics.com
    • ko.tradingeconomics.com
    • +12more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Japan Stock Market Index (JP225) Data [Dataset]. https://tradingeconomics.com/japan/stock-market
    Explore at:
    excel, csv, xml, jsonAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 5, 1965 - Jul 14, 2025
    Area covered
    Japan
    Description

    Japan's main stock market index, the JP225, fell to 39432 points on July 14, 2025, losing 0.35% from the previous session. Over the past month, the index has climbed 2.93%, though it remains 4.47% lower than a year ago, according to trading on a contract for difference (CFD) that tracks this benchmark index from Japan. Japan Stock Market Index (JP225) - values, historical data, forecasts and news - updated on July of 2025.

  5. d

    TNC and Taxi Processed Daily Trips Data with Exogenous Variables

    • dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shen, Qing (2023). TNC and Taxi Processed Daily Trips Data with Exogenous Variables [Dataset]. http://doi.org/10.7910/DVN/XPQF6O
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Shen, Qing
    Description

    The City of Chicago has published trip-level data for every TNC trip since November 1, 2018. To the best of our knowledge, this dataset is the only one that includes trip fare variables. As we wrote this paper in Oct 2022, the dataset includes approximately 263 million trip records (rows) and 21 features (columns) for trips dated from November 1, 2018, through October 1, 2022. The features of this data include Trip ID, Trip Start Timestamp (rounded to the nearest 15 minutes), Trip End Timestamp (rounded to the nearest 15 minutes), Trip Seconds, Trip Miles, Pickup Census Tract, Dropoff Census Tract, Pickup Community Area, Drop Off Community Area, Trip Fare, Tip, Additional Charges, Total Trip Fare, Shared Trip Authorized, Trips Pooled, Pickup Centroid Latitude, Pickup Centroid Longitude, Pickup Centroid Location, Dropoff Centroid Latitude, Dropoff Centroid Longitude, Dropoff Centroid Location. As the dataset is too large to be processed without a supercomputer, we generated a random sample of 2 million trips from Nov 2018 to June 2022 with valid pickup and drop-down area information. To explore the data, we processed the features to extract date information from the timestamp. We created new variables, including each trip's average fare per mile (excluding tips and additional charges, mainly taxes). In dataset (1), the sampled TNC trips data was processed and summarized to include the average daily fare per mile (USD/mile), and exogenous variables that impact the price were added to the data including holidays (Christmas, thanksgiving, Independence Day, easter and new year) and other variables including gas prices, and climate (snow, precipitation, and average daily temperature). The City of Chicago also publishes taxi trips from 2013 to the present. To protect privacy but allow for aggregate analyses, the Taxi ID is consistent for any given taxi medallion number but does not show the number, and times are rounded to the nearest 15 minutes. Due to the data reporting process, not all but most trips are reported. Taxicabs in Chicago, Illinois, are operated by private companies and licensed by the city. About seven thousand licensed cabs are operating within the city limits. As the dataset is too large to be processed without a supercomputer, we generated a random sample of 2 million trips from Nov 2018 to June 2022 with valid pickup and drop-down area information. To explore the data, we processed the features to extract date information from the timestamp. We created new variables, including each trip's average fare per mile (excluding tips and additional charges, mainly taxes). In dataset (2), the taxi trips data was processed and summarized to include the average daily fare per mile (USD/mile), and exogenous variables that impact the price were added to the data including holidays (Christmas, thanksgiving, Independence Day, easter and new year) and other variables including gas prices, and climate (snow, precipitation, and average daily temperature).

  6. d

    Vehicle Miles Traveled

    • data.world
    csv, zip
    Updated Aug 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2023). Vehicle Miles Traveled [Dataset]. https://data.world/associatedpress/vehicle-miles-traveled
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Aug 30, 2023
    Authors
    The Associated Press
    Time period covered
    Mar 1, 2020 - Dec 31, 2020
    Description

    **This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **

    Overview

    Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.

    This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.

    Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.

    This data has been made available to members of APโ€™s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.

    Findings

    • Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.
    • Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.
    • New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least. ## About This Data The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.

    Included Data

    01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.

    Additional Data Queries

    * Filter for specific state - filters 02_vmt_state.csv daily data for specific state.

    * Filter counties by state - filters 03_vmt_county.csv daily data for counties in specific state.

    * Filter for specific county - filters 03_vmt_county.csv daily data for specific county.

    Interactive

    The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:

    @(https://interactives.ap.org/vmt-map/)

    Interactive Embed Code

    Using the Data

    This data can help put your county's mobility in context with your state and over time. The data set contains different measures of change - daily comparisons and seven day rolling averages. The rolling average allows for a smoother trend line for comparison across counties and states. To get the full picture, there are also two available baselines - vehicle miles traveled in January 2020 (pre-pandemic) and vehicle miles traveled at each geography's low point during the pandemic.

    Caveats

    • The data from StreetLight Data, Inc does not include data for some low-population counties with low VMT (<5,000 miles/day in their baseline month of January 2020). In our analyses, we only include the 2,779 counties that have daily data for the entire period (March 1, 2020 to current).
    • In some cases, a lack of decline in mobility from March to April can indicate that movement in the county is essential to keeping the larger economy going or that residents need to drive further to reach essentials businesses like grocery stores compared to other counties.
    • The VMT includes both passenger and commercial miles, so truck traffic is included. However, the proxy is based on the "total number of trip starts and ends for all devices whose most frequent location is in this county". It does not count the VMT of trucks cutting through a county.
    • For those instances where travel begins in one county and ends in another, the county where the miles are recorded is always the vehicleโ€™s home county. ###### Contact reporter Angeliki Kastanis at akastanis@ap.org.
  7. Z

    Dataset on the Human Body as a Signal Propagation Medium

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    A. Elsts (2024). Dataset on the Human Body as a Signal Propagation Medium [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8214496
    Explore at:
    Dataset updated
    Jul 11, 2024
    Dataset provided by
    V. Abolins
    A. Sevcenko
    J. Ormanis
    A. Elsts
    V. Medvedevs
    V. Aristovs
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview: This is a large-scale dataset with impedance and signal loss data recorded on volunteer test subjects using low-voltage alternate current sine-shaped signals. The signal frequencies are from 50 kHz to 20 MHz.

    Applications: The intention of this dataset is to allow to investigate the human body as a signal propagation medium, and capture information related to how the properties of the human body (age, sex, composition etc.), the measurement locations, and the signal frequencies impact the signal loss over the human body.

    Overview statistics:

    Number of subjects: 30

    Number of transmitter locations: 6

    Number of receiver locations: 6

    Number of measurement frequencies: 19

    Input voltage: 1 V

    Load resistance: 50 ohm and 1 megaohm

    Measurement group statistics:

    Height: 174.10 (7.15)

    Weight: 72.85 (16.26)

    BMI: 23.94 (4.70)

    Body fat %: 21.53 (7.55)

    Age group: 29.00 (11.25)

    Male/female ratio: 50%

    Included files:

    experiment_protocol_description.docx - protocol used in the experiments

    electrode_placement_schematic.png - schematic of placement locations

    electrode_placement_photo.jpg - visualization on the experiment, on a volunteer subject

    RawData - the full measurement results and experiment info sheets

    all_measurements.csv - the most important results extracted to .csv

    all_measurements_filtered.csv - same, but after z-score filtering

    all_measurements_by_freq.csv - the most important results extracted to .csv, single frequency per row

    all_measurements_by_freq_filtered.csv - same, but after z-score filtering

    summary_of_subjects.csv - key statistics on the subjects from the experiment info sheets

    process_json_files.py - script that creates .csv from the raw data

    filter_results.py - outlier removal based on z-score

    plot_sample_curves.py - visualization of a randomly selected measurement result subset

    plot_measurement_group.py - visualization of the measurement group

    CSV file columns:

    subject_id - participant's random unique ID

    experiment_id - measurement session's number for the participant

    height - participant's height, cm

    weight - participant's weight, kg

    BMI - body mass index, computed from the valued above

    body_fat_% - body fat composition, as measured by bioimpedance scales

    age_group - age rounded to 10 years, e.g. 20, 30, 40 etc.

    male - 1 if male, 0 if female

    tx_point - transmitter point number

    rx_point - receiver point number

    distance - distance, in relative units, between the tx and rx points. Not scaled in terms of participant's height and limb lengths!

    tx_point_fat_level - transmitter point location's average fat content metric. Not scaled for each participant individually.

    rx_point_fat_level - receiver point location's average fat content metric. Not scaled for each participant individually.

    total_fat_level - sum of rx and tx fat levels

    bias - constant term to simplify data analytics, always equal to 1.0

    CSV file columns, frequency-specific:

    tx_abs_Z_... - transmitter-side impedance, as computed by the process_json_files.py script from the voltage drop

    rx_gain_50_f_... - experimentally measured gain on the receiver, in dB, using 50 ohm load impedance

    rx_gain_1M_f_... - experimentally measured gain on the receiver, in dB, using 1 megaohm load impedance

    Acknowledgments: The dataset collection was funded by the Latvian Council of Science, project โ€œBody-Coupled Communication for Body Area Networksโ€, project No. lzp-2020/1-0358.

    References: For a more detailed information, see this article: J. Ormanis, V. Medvedevs, A. Sevcenko, V. Aristovs, V. Abolins, and A. Elsts. Dataset on the Human Body as a Signal Propagation Medium for Body Coupled Communication. Submitted to Elsevier Data in Brief, 2023.

    Contact information: info@edi.lv

  8. P

    Data from: ImageNet Dataset

    • paperswithcode.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2021). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
    Description

    The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., โ€œthere are cars in this imageโ€ but โ€œthere are no tigers,โ€ and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., โ€œthere is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixelsโ€. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

    Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million

  9. u

    Canadian Large Ensembles Adjusted Dataset version 1 (CanLEADv1) - Catalogue...

    • data.urbandatacentre.ca
    • beta.data.urbandatacentre.ca
    Updated Oct 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Canadian Large Ensembles Adjusted Dataset version 1 (CanLEADv1) - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-a97edbc1-7fda-4ebc-b135-691505d9a595
    Explore at:
    Dataset updated
    Oct 1, 2024
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    The dataset contains large ensembles of bias adjusted daily climate model outputs of minimum temperature, maximum temperature, precipitation, relative humidity, surface pressure, wind speed, incoming shortwave radiation, and incoming longwave radiation on a 0.5-degree grid over North America. Intended uses include hydrological/land surface impact modelling and related event attribution studies. The CanLEADv1 dataset is based on archived climate model simulations in the Canadian Regional Climate Model Large Ensemble (CanRCM4 LE) https://open.canada.ca/data/en/dataset/83aa1b18-6616-405e-9bce-af7ef8c2031c and Canadian Earth System Model Large Ensembles (CanESM2 LE) https://open.canada.ca/data/en/dataset/aa7b6823-fd1e-49ff-a6fb-68076a4a477c datasets. Specifically, CanLEADv1 provides bias adjusted daily climate variables over North America derived from 50 member initial condition ensembles of CanESM2 (ALL and NAT radiative forcings) and CanESM2-driven CanRCM4 (ALL radiative forcings) simulations (Scinocca et al., 2016; Fyfe et al., 2017). Raw CanESM2 LE and CanRCM4 LE outputs are bias adjusted (Cannon, 2018; Cannon et al., 2015) so that they are statistically consistent with two observationally-constrained historical meteorological forcing datasets (S14FD, Iizumi et al., 2017; EWEMBI, Lange, 2018). File names, formats, and metadata headers follow the recommended Data Reference Syntax for bias-adjusted Coordinated Regional Downscaling Experiment (CORDEX) simulations (Nikulin and Legutke, 2016). Multiple initial condition simulations can be used to investigate the externally forced response, internal variability, and the relative role of external forcing and internal variability on the climate system (e.g., Fyfe et al., 2017). Large ensembles of ALL and NAT simulations can be compared in event attribution studies (e.g., Kirchmeier-Young et al., 2017). Availability of bias adjusted outputs from the CanESM2-CanRCM4 modelling system can be used to investigate the added value of dynamical downscaling (Scinocca et al., 2016). Multiple observational datasets are used for bias adjustment to partly account for observational uncertainty (Iizumi et al., 2017). For CanESM2 LE, there are two sets of radiative forcing scenarios (ALL, which consists of historical and RCP8.5 forcings for the periods 1950-2005 and 2006-2100, respectively, and NAT, which consists of historicalNat forcings for the period 1950-2020), two observationally-constrained target datasets for bias adjustment (S14FD and EWEMBI), and 50 ensemble members, which gives a total of 2 ร— 2 ร— 50 = 200 sets of outputs. For CanRCM4 LE, historicalNat simulations were not run; hence, there are 2 ร— 50 = 100 sets of outputs. In both cases, CanLEADv1 provides variables on the CORDEX NAM-44i 0.5-degree grid. CanESM2 outputs (~2.8-degree grid) and CanRCM4 outputs (0.44-degree grid), are bilinearly interpolated onto the NAM-44i grid before bias adjustment. A multivariate version of quantile mapping (Cannon, 2018) is used to adjust the distribution of each simulated variable, as well as the statistical dependence between variables, so that these properties match those of the target observational dataset. Bias adjustment is performed on a grid cell by grid cell basis. Outside of the historical calibration period, the climate change signal simulated by the climate model is preserved (Cannon et al., 2015). References: Cannon, A. J. (2018). Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables. Climate Dynamics, 50(1-2), 31-49. Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: How well do methods preserve changes in quantiles and extremes? Journal of Climate, 28(17), 6938-6959. Fyfe, J. C., Derksen, C., Mudryk, L., Flato, G. M., Santer, B. D., Swart, N. C., Molotch, N. P., Zhang, X., Wan, H., Arora, V. K., Scinocca, J., & Jiao, Y. (2017). Large near-term projected snowpack loss over the western United States. Nature Communications, 8, 14996. Iizumi, T., Takikawa, H., Hirabayashi, Y., Hanasaki, N., & Nishimori, M. (2017). Contributions of different bias-correction methods and reference meteorological forcing data sets to uncertainty in projected temperature and precipitation extremes. Journal of Geophysical Research: Atmospheres, 122(15), 7800-7819. Kirchmeier-Young, M. C., Zwiers, F. W., Gillett, N. P., & Cannon, A. J. (2017). Attributing extreme fire risk in Western Canada to human emissions. Climatic Change, 144(2), 365-379. Lange, S. (2018). Bias correction of surface downwelling longwave and shortwave radiation for the EWEMBI dataset. Earth System Dynamics, 9(2), 627-645.

  10. h

    pii-masking-300k

    • huggingface.co
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai4Privacy (2024). pii-masking-300k [Dataset]. http://doi.org/10.57967/hf/1995
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 4, 2024
    Dataset authored and provided by
    Ai4Privacy
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Purpose and Features

    ๐ŸŒ World's largest open dataset for privacy masking ๐ŸŒŽ The dataset is useful to train and evaluate models to remove personally identifiable and sensitive information from text, especially in the context of AI assistants and LLMs. Key facts:

    OpenPII-220k text entries have 27 PII classes (types of sensitive data), targeting 749 discussion subjects / use cases split across education, health, and psychology. FinPII contains an additional ~20 types tailored toโ€ฆ See the full description on the dataset page: https://huggingface.co/datasets/ai4privacy/pii-masking-300k.

  11. Global Lagrangian dataset of Marine litter

    • zenodo.org
    • data.niaid.nih.gov
    csv, nc
    Updated Dec 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Chassignet; Eric Chassignet; Xiaobiao Xu; Xiaobiao Xu; Olmo Zavala-Romero; Olmo Zavala-Romero (2023). Global Lagrangian dataset of Marine litter [Dataset]. http://doi.org/10.5281/zenodo.6310460
    Explore at:
    nc, csvAvailable download formats
    Dataset updated
    Dec 11, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Eric Chassignet; Eric Chassignet; Xiaobiao Xu; Xiaobiao Xu; Olmo Zavala-Romero; Olmo Zavala-Romero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Global Lagrangian dataset of Marine litter

    This dataset regroups 12 yearly files (global-marine-litter-[2010โ€“2021].nc) combining monthly releases of 32,300 particles initially distributed across the globe following global Mismanaged Plastic Waste (MPW) inputs. The particles are advected with OceanParcels (Delandmeter, P and E van Sebille, 2019) using ocean surface velocity, a wind drag coefficient of 1%, and a small random walk component with a uniform horizontal turbulent diffusion coefficient of Kh = 1m2s-1 representing unresolved turbulent motions in the ocean (see Chassignet et al. 2021 for more details).

    Global oceanic current and atmospheric wind

    Ocean surface velocities are obtained from GOFS3.1, a global ocean reanalysis based on the HYbrid Coordinate Ocean Model (HYCOM) and the Navy Coupled Ocean Data Assimilation (NCODA; Chassignet et al., 2009; Metzger et al., 2014). NCODA uses a three-dimensional (3D) variational scheme and assimilates satellite and altimeter observations as well as in-situ temperature and salinity measurements from moored buoys, Expendable Bathythermographs (XBTs), Argo floats (Cummings and Smedstad, 2013). Surface information is projected downward into the water column using Improved Synthetic Ocean Profiles (Helber et al., 2013). The horizontal resolution and the temporal frequency for the GOF3.1 outputs are 1/12ยฐ (8 km at the equator, 6 km at mid-latitudes) and 3-hourly, respectively. Details on the validation of the ocean circulation model are available in Metzger et al. (2017).

    Wind velocities are obtained from JRA55, the Japanese 55-year atmospheric reanalysis. The JRA55, which spans from 1958 to the present, is the longest third-generation reanalysis that uses the full observing system and a 4D advanced data assimilation variational scheme. The horizontal resolution of JRA55 is about 55 km and the temporal frequency is 3-hourly (see Tsujino et al. (2018) for more details).

    Marine Litter Sources

    The marine litter sources are obtained by combining MPW direct inputs from coastal regions, which are defined as areas within 50 km of the coastline (Lebreton and Andrady 2019), and indirect inputs from inland regions via rivers (Lebreton et al. 2017).

    File Format

    The locations (lon, lat), the corresponding weight (tons), and the source (1: land, 0: river) associated with the 32,300 particles are described in the file initial-location-global.csv. The particle trajectories are regrouped into yearly files (marine-litter-[2010โ€“2021].nc) which contain 12 monthly releases, resulting in a total of 387,600 trajectories per file. More precisely, in each of the yearly files, the first 32,300 lines contain the trajectories of particles released on January 1st, then lines 32,301โ€“64,600 contain the trajectories of particles released on February 1st, and so on. The trajectories are recorded daily and are advected from their release until 2021-12-31, resulting in longer time series for earlier years of the dataset.

    References

    Chassignet, E. P., Hurlburt, H. E., Metzger, E. J., Smedstad, O. M., Cummings, J., Halliwell, G. R., et al. (2009). U.S. GODAE: global ocean prediction with the hybrid coordinate ocean model (HYCOM). Oceanography 22, 64โ€“75. doi: 10.5670/oceanog.2009.39

    Chassignet, E. P., Xu, X., and Zavala-Romero, O. (2021). Tracking Marine Litter With a Global Ocean Model: Where Does It Go? Where Does It Come From?. Frontiers in Marine Science, 8, 414, doi: 10.3389/fmars.2021.667591

    Cummings, J. A., and Smedstad, O. M. (2013). โ€œChapter 13: variational data assimilation for the global oceanโ€, in Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications, Vol. II, eds S. Park and L. Xu (Berlin: Springer), 303โ€“343. doi: 10.1007/978-3-642-35088-7_13

    Delandmeter, P., and van Sebille, E. (2019). The Parcels v2.0 Lagrangian framework: new field interpolation schemes. Geosci. Model Dev. 12, 3571โ€“3584. doi: 10.5194/gmd-12-3571-2019

    Helber, R. W., Townsend, T. L., Barron, C. N., Dastugue, J. M., and Carnes, M. R. (2013). Validation Test Report for the Improved Synthetic Ocean Profile (ISOP) System, Part I: Synthetic Profile Methods and Algorithm. NRL Memo. Report, NRL/MR/7320โ€”13-9364 Hancock, MS: Stennis Space Center.

    Metzger, E. J., Smedstad, O. M., Thoppil, P. G., Hurlburt, H. E., Cummings, J. A., Wallcraft, A. J., et al. (2014). US Navy operational global ocean and Arctic ice prediction systems. Oceanography 27, 32โ€“43, doi: 10.5670/oceanog.2014.66.

    Metzger, E., Helber, R. W., Hogan, P. J., Posey, P. G., Thoppil, P. G., Townsend, T. L., et al. (2017). Global Ocean Forecast System 3.1 validation test. Technical Report. NRL/MR/7320โ€“17-9722. Hancock, MS: Stennis Space Center, 61.

    Lebreton, L., and Andrady, A. (2019). Future scenarios of global plastic waste generation and disposal. Palgrave Commun. 5:6, doi: 10.1057/s41599-018-0212-7.

    Lebreton, L., van der Zwet, J., Damsteeg, J. W., Slat, B., Andrady, A., and Reisser, J. (2017). River plastic emissions to the worldโ€™s oceans. Nat. Commun. 8:15611, doi: 10.1038/ncomms15611.

    Tsujino H., S. Urakawa, H. Nakano, R.J. Small, W.M. Kim, S.G. Yeager, G. Danabasoglu, T. Suzuki, J.L. Bamber, M. Bentsen, C. Bรถning, A. Bozec, E.P. Chassignet, E. Curchitser, F. Boeira Dias, P.J. Durack, S.M. Griffies, Y. Harada, M. Ilicak, S.A. Josey, C. Kobayashi, S. Kobayashi, Y. Komuro, W.G. Large, J. Le Sommer, S.J. Marsland, S. Masina, M. Scheinert, H. Tomita, M. Valdivieso, and D. Yamazaki, 2018. JRA-55 based surface dataset for driving ocean-sea-ice models (JRA55-do). Ocean Modelling, 130, 79-139, doi: 10.1016/j.ocemod.2018.07.002.

  12. d

    Data from: Above-ground plant properties are not leading indicators of...

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data from: Above-ground plant properties are not leading indicators of grazing-induced soil carbon accrual in the Northern Great Plains [Dataset]. https://catalog.data.gov/dataset/data-from-above-ground-plant-properties-are-not-leading-indicators-of-grazing-induced-soil
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This is digital research data corresponding to a manuscript, Above-ground plant properties are not leading indicators of grazing-induced soil carbon accrual in the Northern Great Plains, published in Ecological Indicators. Little is known about how grazing-induced shifts in plant properties correspond with shifts in soil organic carbon (SOC) stocks. To help fill this gap, we used data from a field experiment to test whether above-ground plant properties (i.e. biomass, species richness) act as leading indicators of grazing-induced SOC accrual in the Northern Great Plains, USA.Our 5-yr bovine grazing experiment had a randomized complete block design and pre-treatment data. Moderate summer grazing (control) is widely used in the Northern Great Plains, and treatments that may alter grassland vegetation and SOC included: severe summer grazing, moderate fall grazing, and severe fall grazing. The four grazing treatments were applied to 20 paddocks (60 ร— 30 m) arranged in a randomized complete block design with 5 replications. Grazing intensities approximated recommended (i.e. moderate; 1 animal unit month [AUM] ร— ha-1 ร— year-1) and severe (1.5 AUM ร— ha-1 ร— year-1) stocking rates. Summer grazing occurred during the third week of June and fall grazing was after killing frosts at the end of October.This study's dataset is of a subset of data for this grazing experiment. Given the study aim's, the dataset included a single measure of SOC stock (0-60 cm depth increment) and three plant properties (current-year above-ground biomass, older dead above-ground biomass, and plant species richness). SOC data were for 2013 and 2018 while plant data were for 2014 and 2017. Additional details can be found in the readme file, open access manuscript, and manuscript's supplement.

  13. Adjusted daily rainfall and snowfall dataset for Canada

    • open.canada.ca
    zip
    Updated Apr 22, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2023). Adjusted daily rainfall and snowfall dataset for Canada [Dataset]. https://open.canada.ca/data/en/dataset/d8616c52-a812-44ad-8754-7bcc0d8de305
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2023
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    The AdjDlyRS dataset contains adjusted daily rainfall (R) and snowfall (S) data from all Canadian stations reporting rainfall and snowfall for which we have metadata to do the adjustments (Wang et al. 2017). The processing includes inspection and adjustments using quality control procedures customized for producing gridded datasets (Wang et al. 2017), including: (1) conversion of snowfall ruler measurements to their water equivalents; (2) corrections for gauge undercatch and evaporation due to wind effect, for gauge specific wetting loss, and for trace precipitation amount; and (3) treatment of flags (e.g. accumulation flags). Version 2020 or later versions of this dataset also includes identification and correction of random erroneous values, including false zeros, which usually arose from missing values being misrecorded as 0 precipitation in the climate Archive (Cheng et al. 2022). All the identified erroneous daily values are set to missing. A total of 3346 stations were processed, but the data series are not homogenized. Most of the stations are located in southern Canada and have short and/or seasonal data records. The number of stations changes over time: there are 512-958 stations in the period 1948-1964, 1012-2038 stations in the period 1965-2008, and only around 300 stations in the recent years. Note that the unadjusted/raw total precipitation data in Environment and Climate Change Canada's digital Archive underestimate more than 25% of the total precipitation in northern Canada, and about 10-15% in most of southern Canada (Wang et al. 2017). References: (1) Wang, X. L., Xu, B. Qian, Y. Feng, E. Mekis, 2017: Adjusted daily rainfall and snowfall data for Canada, Atmosphere-Ocean, 55:3, 155-168, DOI:10.1080/07055900.2017.1342163. (2) Cheng, V. Y.S., X. L. Wang, Y. Feng, 2022: A quality control system for historical in situ precipitation data. Atmosphere-Ocean (submitted)

  14. Individuals and Households Program - Valid Registrations

    • catalog.data.gov
    • s.cnmilf.com
    Updated Jun 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FEMA/Response and Recovery/Recovery Directorate (2025). Individuals and Households Program - Valid Registrations [Dataset]. https://catalog.data.gov/dataset/individuals-and-households-program-valid-registrations-nemis
    Explore at:
    Dataset updated
    Jun 7, 2025
    Dataset provided by
    Federal Emergency Management Agencyhttp://www.fema.gov/
    Description

    This dataset contains FEMA applicant-level data for the Individuals and Households Program (IHP). All PII information has been removed. The location is represented by county, city, and zip code. This dataset contains Individual Assistance (IA) applications from DR1439 (declared in 2002) to those declared over 30 days ago. The full data set is refreshed on an annual basis and refreshed weekly to update disasters declared in the last 18 months. This dataset includes all major disasters and includes only valid registrants (applied in a declared county, within the registration period, having damage due to the incident and damage within the incident period). Information about individual data elements and descriptions are listed in the metadata information within the dataset.rnValid registrants may be eligible for IA assistance, which is intended to meet basic needs and supplement disaster recovery efforts. IA assistance is not intended to return disaster-damaged property to its pre-disaster condition. Disaster damage to secondary or vacation homes does not qualify for IHP assistance.rnData comes from FEMA's National Emergency Management Information System (NEMIS) with raw, unedited, self-reported content and subject to a small percentage of human error.rnAny financial information is derived from NEMIS and not FEMA's official financial systems. Due to differences in reporting periods, status of obligations and application of business rules, this financial information may differ slightly from official publication on public websites such as usaspending.gov. This dataset is not intended to be used for any official federal reporting. rnCitation: The Agencyโ€™s preferred citation for datasets (API usage or file downloads) can be found on the OpenFEMA Terms and Conditions page, Citing Data section: https://www.fema.gov/about/openfema/terms-conditions.rnDue to the size of this file, tools other than a spreadsheet may be required to analyze, visualize, and manipulate the data. MS Excel will not be able to process files this large without data loss. It is recommended that a database (e.g., MS Access, MySQL, PostgreSQL, etc.) be used to store and manipulate data. Other programming tools such as R, Apache Spark, and Python can also be used to analyze and visualize data. Further, basic Linux/Unix tools can be used to manipulate, search, and modify large files.rnIf you have media inquiries about this dataset, please email the FEMA News Desk at FEMA-News-Desk@fema.dhs.gov or call (202) 646-3272. For inquiries about FEMA's data and Open Government program, please email the OpenFEMA team at OpenFEMA@fema.dhs.gov.rnThis dataset is scheduled to be superceded by Valid Registrations Version 2 by early CY 2024.

  15. Z

    Muse EEG Subconscious Decisions Dataset

    • data.niaid.nih.gov
    • produccioncientifica.ugr.es
    • +1more
    Updated Nov 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco M. Garcia-Moreno (2023). Muse EEG Subconscious Decisions Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8429739
    Explore at:
    Dataset updated
    Nov 10, 2023
    Dataset provided by
    Francisco M. Garcia-Moreno
    Ana รlvarez-Muelas
    Daniel Fernรกndez Mรฉrida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The first Muse EEG Dataset for Subconscious Decision Making Study.

    Dataset Description:

    • 20 subject data

    • Different trials per each subject

    The data provided at MUSE folder have the following variables:

    Timestamp: date and time with millisecond precision of the captured data. It is stored in the format YYYYY-MM-DD HH:mm:SS.fff, where YYYYY to the year, MM to the month, DD to the day, HH to the hour, mm to the minute, SS to the second and fff to the millisecond.

    Delta: brain waves with the largest wave amplitude, mainly active with deep sleep phases, so they are related to processes that do not depend on a state of consciousness. These waves have a frequency of between 1 and 4 Hz.

    Theta: the brain waves with the largest wave amplitude after theta waves, present in deep calm, relaxation and immersion stages in memories, so they are associated with a present consciousness but disconnected from reality and focused on imaginary experiences. These waves have a frequency of between 4 and 8 Hz.

    Alpha: the waves with the largest wave amplitude after Theta waves, present in stages of relaxation such as a walk or watching TV, and are therefore related to calm related to processes of deep calm with present awareness. These waves have a frequency of between 7.5 and 13 Hz.

    Beta: these are the lowest amplitude waves, after gamma waves, present in states that require a certain level of attention or alertness, in which one has to be aware of the changes These waves have a frequency of between 13 and 30 Hz.

    Gamma: these are the lowest amplitude waves, present in states of wakefulness, which are associated with a broadening of focus and memory management. These waves have a frequency between 30 and 44 Hz.

    Raw: these are the representation of the raw electrical signals captured by Muse.

    AUX_RIGHT: raw waveforms captured by an auxiliary USB sensor.

    Mellow: User relaxation.

    Concentration: User concentration.

    Accelerometer (X, Y, Z): detects device movements, tilts, tilts up, tilts down and tilts

    upwards, downwards and sideways.

    Gyro (X, Y, Z): gyroscope movement over time.

    HeadBandOn: indicates if the band is on the head.

    HSI: sensor quality, the closer to 1 the better the quality.

    Battery: remaining battery of the device.

    Elements: different actions that the subject can perform, such as blinking or jaw clenching.

    The data provided at LOCAL folder include the decision timing measures:

    ID: identifier of the participant in the experiment. The identifier is an integer, which starts at 0 and has consecutive values. In our case case, it will go up to 19.

    Trial: session of the experiment in which the results have been recorded. A session runs from the time the "Start Experiment" button is clicked until it appears again. The number of the session is an integer number starting at 0. It is an integer starting at 0 and has consecutive values. In our specific case, it will reach up to 9

    Response: identifier of each response of each session. A response is from the moment a blank screen is displayed until the letter displayed on the screen is chosen at the moment the impulse to press a key is felt. It is an integer starting at 0 and has consecutive values.รง

    Start time: the time at which the response starts, i.e. the time from when a blank screen appears until the decision-making process begins. The decision making process is initiated.

    Letter appearance time: the time at which random letters start to appear on the screen. From this point onwards, the participant can press the right or left key at any time he/she wishes.

    Time of the keystroke: the time at which the participant presses the P (right) or Q (left) key.

    Chosen key: choice made by the participant. It shall have as possible values p and q.

    Time of appearance of the observed letter: time at which the letter that the participant was asked to.

    The time of occurrence of the observed letter: time at which the letter appears that the user has been asked to remember at the time he/she makes the free will decision. It is interpreted as the time of the decision.

    Observed letter: response to the recall of the letter that appeared on the screen at the moment of feeling the free will impulse. It will have as possible values: S, R, N, D, L, C, T, M or #.

  16. ERA5 hourly data on pressure levels from 1940 to present

    • cds.climate.copernicus.eu
    • cds-test-cci2.copernicus-climate.eu
    grib
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 hourly data on pressure levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.bd0915c6
    Explore at:
    gribAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf

    Time period covered
    Jan 1, 1940 - Jul 6, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on pressure levels from 1940 to present".

  17. H

    open-pii-masking-500k-ai4privacy

    • dataverse.harvard.edu
    Updated Mar 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Anthony (2025). open-pii-masking-500k-ai4privacy [Dataset]. http://doi.org/10.7910/DVN/4H11OA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Michael Anthony
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    ๐ŸŒ World's largest open dataset for privacy masking ๐ŸŒŽ The dataset is useful to train and evaluate models to remove personally identifiable and sensitive information from text, especially in the context of AI assistants and LLMs. Task Showcase of Privacy Masking # Dataset Analytics ๐Ÿ“Š - ai4privacy/open-pii-masking-500k-ai4privacy ## p5y Data Analytics - Total Entries: 580,227 - Total Tokens: 19,199,982 - Average Source Text Length: 17.37 words - Total PII Labels: 5,705,973 - Number of Unique PII Classes: 20 (Open PII Labelset) - Unique Identity Values: 704,215 --- ## Language Distribution Analytics Number of Unique Languages: 8 | Language | Count | Percentage | |--------------------|----------|------------| | English (en) ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡ฌ๐Ÿ‡ง๐Ÿ‡จ๐Ÿ‡ฆ๐Ÿ‡ฎ๐Ÿ‡ณ | 150,693 | 25.97% | | French (fr) ๐Ÿ‡ซ๐Ÿ‡ท๐Ÿ‡จ๐Ÿ‡ญ๐Ÿ‡จ๐Ÿ‡ฆ | 112,136 | 19.33% | | German (de) ๐Ÿ‡ฉ๐Ÿ‡ช๐Ÿ‡จ๐Ÿ‡ญ | 82,384 | 14.20% | | Spanish (es) ๐Ÿ‡ช๐Ÿ‡ธ ๐Ÿ‡ฒ๐Ÿ‡ฝ | 78,013 | 13.45% | | Italian (it) ๐Ÿ‡ฎ๐Ÿ‡น๐Ÿ‡จ๐Ÿ‡ญ | 68,824 | 11.86% | | Dutch (nl) ๐Ÿ‡ณ๐Ÿ‡ฑ | 26,628 | 4.59% | | Hindi (hi)* ๐Ÿ‡ฎ๐Ÿ‡ณ | 33,963 | 5.85% | | Telugu (te)* ๐Ÿ‡ฎ๐Ÿ‡ณ | 27,586 | 4.75% | *these languages are in experimental stages --- ## Region Distribution Analytics Number of Unique Regions: 11 | Region | Count | Percentage | |-----------------------|----------|------------| | Switzerland (CH) ๐Ÿ‡จ๐Ÿ‡ญ | 112,531 | 19.39% | | India (IN) ๐Ÿ‡ฎ๐Ÿ‡ณ | 99,724 | 17.19% | | Canada (CA) ๐Ÿ‡จ๐Ÿ‡ฆ | 74,733 | 12.88% | | Germany (DE) ๐Ÿ‡ฉ๐Ÿ‡ช | 41,604 | 7.17% | | Spain (ES) ๐Ÿ‡ช๐Ÿ‡ธ | 39,557 | 6.82% | | Mexico (MX) ๐Ÿ‡ฒ๐Ÿ‡ฝ | 38,456 | 6.63% | | France (FR) ๐Ÿ‡ซ๐Ÿ‡ท | 37,886 | 6.53% | | Great Britain (GB) ๐Ÿ‡ฌ๐Ÿ‡ง | 37,092 | 6.39% | | United States (US) ๐Ÿ‡บ๐Ÿ‡ธ | 37,008 | 6.38% | | Italy (IT) ๐Ÿ‡ฎ๐Ÿ‡น | 35,008 | 6.03% | | Netherlands (NL) ๐Ÿ‡ณ๐Ÿ‡ฑ | 26,628 | 4.59% | --- ## Machine Learning Task Analytics | Split | Count | Percentage | |-------------|----------|------------| | Train | 464,150 | 79.99% | | Validate| 116,077 | 20.01% | --- # Usage Option 1: Python terminal pip install datasets python from datasets import load_dataset dataset = load_dataset("ai4privacy/open-pii-masking-500k-ai4privacy") # Compatible Machine Learning Tasks: - Tokenclassification. Check out a HuggingFace's guide on token classification. - ALBERT, BERT, BigBird, BioGpt, BLOOM, BROS, CamemBERT, CANINE, ConvBERT, Data2VecText, DeBERTa, DeBERTa-v2, DistilBERT, ELECTRA, ERNIE, ErnieM, ESM, Falcon, FlauBERT, FNet, Funnel Transformer, GPT-Sw3, OpenAI GPT-2, GPTBigCode, GPT Neo, GPT NeoX, I-BERT, LayoutLM, LayoutLMv2, LayoutLMv3, LiLT, Longformer, LUKE, MarkupLM, MEGA, Megatron-BERT, MobileBERT,...

  18. T

    Canada Stock Market Index (TSX) Data

    • tradingeconomics.com
    • de.tradingeconomics.com
    • +13more
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, Canada Stock Market Index (TSX) Data [Dataset]. https://tradingeconomics.com/canada/stock-market
    Explore at:
    csv, xml, excel, jsonAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 29, 1979 - Jul 11, 2025
    Area covered
    Canada
    Description

    Canada's main stock market index, the TSX, fell to 27023 points on July 11, 2025, losing 0.22% from the previous session. Over the past month, the index has climbed 1.53% and is up 19.18% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from Canada. Canada Stock Market Index (TSX) - values, historical data, forecasts and news - updated on July of 2025.

  19. Data from: USBombus, contemporary survey data of North American bumble bees...

    • gbif.org
    Updated Mar 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan B. Koch; Jonathan B. Koch (2025). USBombus, contemporary survey data of North American bumble bees (Hymenoptera, Apidae, Bombus) distributed in the United States [Dataset]. http://doi.org/10.15468/g8cnke
    Explore at:
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    ZooKeys
    Authors
    Jonathan B. Koch; Jonathan B. Koch
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Time period covered
    Jul 13, 2007 - Aug 1, 2010
    Area covered
    Description

    This paper describes USBombus, a large dataset that represents the outcomes of the largest

    standardized survey of bee pollinators (Hymenoptera, Apidae, Bombus) on the planet. The

    motivation to collect live bumble bees across the US was to document the decline and

    conservation status of Bombus affinis, B. occidentalis, B. pensylvanicus, and B. terricola. The

    results of study have been published Proceedings of the National Academy of Sciences as

    โ€œPatterns of widespread decline in North American bumble beesโ€ by Cameron et al. (2011). In

    this dataset we have documented a total of 17,796 adult occurrence records across 391 locations

    and 38 species of Bombus. The geospatial coverage of the dataset extends across 41 of the 50 US

    states and from 0 to 3500 m a.s.l. The temporal scale of the dataset represents systematic surveys

    that took place from 2007 to 2010. The dataset was developed using SQL server 2008 r2. For a each specimen, the following information is generally provided: species name, sex, caste,

    temporal and geospatial details, Cartesian coordinates, data collector(s), and when available, host

    plants. This database has already proven useful for a variety of studies on bumble bee ecology

    and conservation. Considering the value of pollinators in agriculture and wild ecosystems, this

    large systematic collection of bumble bee occurrence records will likely prove useful in

    investigations into the effects of anthropogenic activities on pollinator community composition

    and conservation status.

  20. D

    Average Maximum Afternoon Temperature (F)

    • data.seattle.gov
    • s.cnmilf.com
    • +1more
    application/rdfxml +5
    Updated Feb 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Average Maximum Afternoon Temperature (F) [Dataset]. https://data.seattle.gov/dataset/Average-Maximum-Afternoon-Temperature-F-/ev6g-yenv
    Explore at:
    xml, csv, tsv, json, application/rdfxml, application/rssxmlAvailable download formats
    Dataset updated
    Feb 3, 2025
    Description
    This data layer references data from a high-resolution tree canopy change-detection layer for Seattle, Washington. Tree canopy change was mapped by using remotely sensed data from two time periods (2016 and 2021). Tree canopy was assigned to three classes: 1) no change, 2) gain, and 3) loss. No change represents tree canopy that remained the same from one time period to the next. Gain represents tree canopy that increased or was newly added, from one time period to the next. Loss represents the tree canopy that was removed from one time period to the next. Mapping was carried out using an approach that integrated automated feature extraction with manual edits. Care was taken to ensure that changes to the tree canopy were due to actual change in the land cover as opposed to differences in the remotely sensed data stemming from lighting conditions or image parallax. Direct comparison was possible because land-cover maps from both time periods were created using object-based image analysis (OBIA) and included similar source datasets (LiDAR-derived surface models, multispectral imagery, and thematic GIS inputs). OBIA systems work by grouping pixels into meaningful objects based on their spectral and spatial properties, while taking into account boundaries imposed by existing vector datasets. Within the OBIA environment a rule-based expert system was designed to effectively mimic the process of manual image analysis by incorporating the elements of image interpretation (color/tone, texture, pattern, location, size, and shape) into the classification process. A series of morphological procedures were employed to ensure that the end product is both accurate and cartographically pleasing. No accuracy assessment was conducted, but the dataset was subjected to manual review and correction.

    University of Vermont Spatial Analysis Laboratory

    This dataset consists of hexagons 50-acres in area, or several city blocks. The dataset covers the following tree canopy categories:
    • Existing tree canopy percent
    • Possible tree canopy - vegetation percent
    • Relative percent change
    • Absolute percent change
    • Average maximum afternoon temperature (F)
    • Tree canopy percentage & average afternoon temperature (F)
    For more information, please see the 2021 Tree Canopy Assessment.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2025). S&P 500 [Dataset]. https://fred.stlouisfed.org/series/SP500

S&P 500

SP500

Explore at:
82 scholarly articles cite this dataset (View in Google Scholar)
jsonAvailable download formats
Dataset updated
Jul 11, 2025
License

https://fred.stlouisfed.org/legal/#copyright-pre-approvalhttps://fred.stlouisfed.org/legal/#copyright-pre-approval

Description

View data of the S&P 500, an index of the stocks of 500 leading companies in the US economy, which provides a gauge of the U.S. equity market.

Search
Clear search
Close search
Google apps
Main menu