16 datasets found
  1. Amazon Reviews Dataset

    • kaggle.com
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dongre Laxman (2024). Amazon Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/dongrelaxman/amazon-reviews-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dongre Laxman
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset comprises customer reviews for Amazon, an online retail giant, featuring insights into customer experiences, including ratings, review titles, texts, and metadata. It is valuable for analyzing customer satisfaction, sentiment, and trends.

    Column Descriptions:

    Reviewer Name: Identifies the reviewer. Profile Link: Links to the reviewer's profile for additional insights. Country: Indicates the reviewer's location. Review Count: Number of reviews by the same user, showing engagement level. Review Date: When the review was posted, useful for time analysis. Rating: Numerical satisfaction measure. Review Title: Summarizes the review sentiment. Review Text: Detailed customer feedback. Date of Experience: When the service/product was experienced.

    Prospective applications:

    Sentiment Analysis: Analyze review texts and titles to assess overall customer sentiment toward products, enabling the identification of strengths and weaknesses. Customer Satisfaction Tracking: Track and visualize rating trends over time to understand fluctuations in customer satisfaction. Product Improvement: Identify common themes in reviews to highlight areas for product enhancement or development. Market Segmentation: Use country and demographic information to customize marketing strategies and gain insights into regional preferences. Competitor Analysis: Evaluate customer feedback on Amazon products in comparison to competitors to determine market positioning. Recommendation Systems: Leverage review data to enhance recommendation algorithms, improving personalized shopping experiences. Trend Analysis: Investigate temporal patterns in reviews to link sentiment changes with marketing efforts or product launches.

    This extensive dataset serves as a valuable asset for various analyses focused on enhancing customer engagement and refining business strategies.

  2. d

    Crash Summary for Intersections (Last 5 years) - Datasets - data.wa.gov.au

    • catalogue.data.wa.gov.au
    Updated Aug 15, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Crash Summary for Intersections (Last 5 years) - Datasets - data.wa.gov.au [Dataset]. https://catalogue.data.wa.gov.au/dataset/mrwa-crash-summary-for-intersections-last-5-years-
    Explore at:
    Dataset updated
    Aug 15, 2018
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Western Australia
    Description

    The total number of intersection crashes in Western Australia. The intersection contains the total number of aggregated crashes for all crashes recorded in the last 5 calendar years.Note: The 2024 records have been temporarily removed from the dataset. The crash data now covers the five-year period from 2019 to 2023. We apologise for any inconvenience.

    Crashes are recorded in the Integrated Road Information System (IRIS). This layer shows the total number of crashes at each intersection and is provided for information only.

    Note that you are accessing this data pursuant to a Creative Commons (Attribution) Licence which has a disclaimer of warranties and limitation of liability. You accept that the data provided pursuant to the Licence is subject to changes.

    Pursuant to section 3 of the Licence you are provided with the following notice to be included when you Share the Licenced Material:- “The Commissioner of Main Roads is the creator and owner of the data and Licenced Material, which is accessed pursuant to a Creative Commons (Attribution) Licence, which has a disclaimer of warranties and limitation of liability.

    Crash Data Dictionary

    Creative Commons CC BY 4.0

  3. f

    Data from: Enriching time series datasets using Nonparametric kernel...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mohamad Ivan Fanany
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

  4. f

    Comparison of the total accuracy for the new automated activation detection...

    • plos.figshare.com
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samineh Mesbah; Claudia A. Angeli; Robert S. Keynton; Ayman El-baz; Susan J. Harkema (2023). Comparison of the total accuracy for the new automated activation detection method with the TKEO and SODM methods based on five-number-summary. [Dataset]. http://doi.org/10.1371/journal.pone.0185582.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Samineh Mesbah; Claudia A. Angeli; Robert S. Keynton; Ayman El-baz; Susan J. Harkema
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Comparison of the total accuracy for the new automated activation detection method with the TKEO and SODM methods based on five-number-summary.

  5. Data from: Summary for Policymakers of the Working Group I Contribution to...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Aug 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Fyfe; Baylor Fox-Kemper; Robert Kopp; Gregory Garner (2021). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.8 (v20210809) [Dataset]. https://catalogue.ceda.ac.uk/uuid/98af2184e13e4b91893ab72f301790db
    Explore at:
    Dataset updated
    Aug 9, 2021
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    John Fyfe; Baylor Fox-Kemper; Robert Kopp; Gregory Garner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1950 - Dec 31, 2300
    Area covered
    Earth
    Description

    Data for Figure SPM.8 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure SPM.8 shows selected indicators of global climate change under the five core scenarios used in this report.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

    IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.

    Figure subpanels

    The figure has five panels, with data provided for all panels in subdirectories named panel_a, panel_b, panel_c, panel_d and panel_e.

    List of data provided

    This dataset contains:

    • Historical, SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5 Global Surface Air Temperature (GSAT) anomalies relative to 1850-1900 (20 year means)
    • Historical, SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5 September sea-ice area
    • Historical, SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5 Global ocean surface pH
    • Historical sea level relative to 1900 from gauges (to 1992) and altimeters (1993 on) (offset 0.158 m vs. 1995-2014)
    • AR6 sea level projections relative to 1900 (offset 0.158 m vs. 1995-2014)
    • AR6 assessed global mean sea level at 2300 relative to 1900 (offset 0.158 m vs. 1995-2014)

    The five illustrative SSP (Shared Socio-economic Pathway) scenarios are described in Box SPM.1 of the Summary for Policymakers and Section 1.6.1.1 of Chapter 1.

    Data provided in relation to figure

    Panel a: Near-Surface Air Temperature

    • Data file: panel_a/tas_global_Historical.csv (black line and grey shading)
    • Data file: panel_a/tas_global_SSP1_1_9.csv (cyan line)
    • Data file: panel_a/tas_global_SSP1_2_6.csv (blue line and blue shading)
    • Data file: panel_a/tas_global_SSP2_4_5.csv (orange line)
    • Data file: panel_a/tas_global_SSP3_7_0.csv (red line and red shading)
    • Data file: panel_a/tas_global_SSP5_8_5.csv (brown line)

    Panel b: Sea-Ice Area

    • Data file: panel_b/sia_arctic_september_Historical.csv (black line and grey shading)
    • Data file: panel_b/sia_arctic_september_SSP1_1_9.csv (cyan line)
    • Data file: panel_b/sia_arctic_september_SSP1_2_6.csv (blue line and blue shading)
    • Data file: panel_b/sia_arctic_september_SSP2_4_5.csv (orange line)
    • Data file: panel_b/sia_arctic_september_SSP3_7_0.csv (red line and red shading)
    • Data file: panel_b/sia_arctic_september_SSP5_8_5.csv (brown line)

    Panel c: Ocean Surface pH

    • Data file: panel_c/phos_global_Historical.csv (black line and grey shading
    • Data file: panel_c/phos_global_SSP1_1_9.csv (cyan line
    • Data file: panel_b/phos_global_SSP1_2_6.csv (blue line and blue shading)
    • Data file: panel_c/phos_global_SSP2_4_5.csv (orange line)
    • Data file: panel_c/phos_global_SSP3_7_0.csv (red line and red shading)
    • Data file: panel_c/phos_global_SSP5_8_5.csv (brown line)

    Panel d: Sea Level

    • Data file: panel_d/global_sea_level_observed.csv (black line)
    • Data file: panel_d/global_sea_level_projected.csv (cyan, blue, orange, red and brown lines, red and blue shading)

    Panel e: Sea Level

    • Data file: panel_e: global_sea_level_2300_assessed.csv (columns 2 and 3, SSP1-2.6 scenario; columns 4 to 6 SSP5-8.5 scenario)

    Sources of additional information

    The following weblinks are provided in the Related Documents section of this catalogue record:

    • Link to the report component containing the figure (Summary for Policymakers)
  6. f

    Five-number summary of satisfaction scores overall and per device group,...

    • plos.figshare.com
    xls
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theresa Pfurtscheller; Kevin Baker; Tedila Habte; Kévin Lasmi; Lena Matata; Akasiima Mucunguzi; Jill Nicholson; Anthony Nuwa; Max Petzold; Mónica Posada González; Anteneh Sebsibe; Tobias Alfvén; Karin Källander (2023). Five-number summary of satisfaction scores overall and per device group, country, and cadre. [Dataset]. http://doi.org/10.1371/journal.pgph.0001800.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 18, 2023
    Dataset provided by
    PLOS Global Public Health
    Authors
    Theresa Pfurtscheller; Kevin Baker; Tedila Habte; Kévin Lasmi; Lena Matata; Akasiima Mucunguzi; Jill Nicholson; Anthony Nuwa; Max Petzold; Mónica Posada González; Anteneh Sebsibe; Tobias Alfvén; Karin Källander
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Five-number summary of satisfaction scores overall and per device group, country, and cadre.

  7. Summary for Policymakers of the Working Group I Contribution to the IPCC...

    • catalogue.ceda.ac.uk
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nathan P. Gillett; Elizaveta Malinina; Darrell Kaufman; Raphael Neukom (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.1 (v20221116) [Dataset]. https://catalogue.ceda.ac.uk/uuid/0b2759059ad6474098e40dad73e0a8ec
    Explore at:
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Nathan P. Gillett; Elizaveta Malinina; Darrell Kaufman; Raphael Neukom
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1 - Dec 31, 2019
    Area covered
    Earth
    Variables measured
    surface_temperature_anomaly
    Description

    Data for Figure SPM.1 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure SPM.1 shows global temperature history and causes of recent warming.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

    IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T. K. Maycock, T. Waterfield, O. Yelekçi, R. Yu and B. Zhou (eds.)]. Cambridge University Press. In Press.

    Figure subpanels

    The figure has two panels, with data provided for all panels in subdirectories named panel_a and panel_b.

    List of data provided

    Panel a

    The dataset contains:

    • Estimated temperature during the warmest multi-century period in at least the last 100,000 years, which occurred around 6500 years ago (4500 BCE), multi-centennial average, from AR6 WGI Chapter 2
    • Global surface temperature change time series relative to 1850-1900 for 1-2020 from: • 1-2000 CE reconstruction from paleoclimate archives, decadal smoothed, from PAGES2k Consortium (2019, DOI: 10.1038/s41561-019-0400-0) • 1850-2020 CE, observations, decadal smoothed, from AR6 WGI Chapter 2 assessed mean

    Panel b:

    The dataset contains global surface temperature change time series relative to 1850-1900 for 1850-2020 from simulations from the sixth phase of the Coupled Model Intercomparison Project (CMIP6) and observations:

    • CMIP6 historical+ssp245 simulations (simulations with human and natural forcing, 1850-2019)
    • CMIP6 hist-nat simulations (simulations with natural forcing, 1850-2019)
    • Global Surface Temperature Anomalies (GSTA) relative to 1850-1900 from observations assessed in IPCC AR6 WG1 Chapter 2 (1850-2020)

    Data provided in relation to figure

    Panel a:

    • panel_a/SPM1_1-2000_recon.csv, 1-2000 time series, decadal smoothed, for years centred on 5-1996 CE [column 1 grey line, columns 2 and 3 grey shading]
    • panel_a/SPM1_1850-2020_obs.csv, 1850-2020 time series, decadal smoothed, for years centered on 1855-2016 CE [black line]
    • panel_a/SPM1_6500_recon.csv, bar for the warmest multi-century period in more than 100,000 years (around 6500 years ago: 4500 BCE) [grey bar]

    Panel b:

    • panel_b/gmst_changes_model_and_obs.csv. Global surface temperature change time series relative to 1850-1900 for 1850-2020 from: • CMIP6 historical+ssp245 simulations (1850-2019) [mean, brown line] • CMIP6 historical+ssp245 simulations (1850-2019) [5% range, brown shading, bottom] • CMIP6 historical+ssp245 simulations (1850-2019) [95% range, brown shading, top] • CMIP6 hist-nat simulations (1850-2019) [mean, green line] • CMIP6 hist-nat simulations (1850-2019) [5% range, green shading, bottom] • CMIP6 hist-nat simulations (1850-2019) [95% range, green shading, top] • Global Surface Temperature Anomalies (GSTA) relative to 1850-1900 from observations assessed in IPCC AR6 WG1 Chapter 2 (1850-2020) [black line]

    Sources of additional information

    The following weblinks are provided in the Related Documents section of this catalogue record: - Link to the figure on the IPCC AR6 website - Link to the report webpage, which includes the report component containing the figure (Summary for Policymakers), the Technical Summary (Cross-Section Box TS.1, Figure 1a) and the Supplementary Material for Chapters 2 and 3, which contains details on the input data used in Tables 2.SM.1 (Figure 2.11a) and 3.SM.1 (Figure 3.2c; FAQ 3.1, Figure 1). - Link to related publication for input data - Link to the webpage of the WGI report

  8. e

    Trusted Research Environments: Analysis of Characteristics and Data...

    • b2find.eudat.eu
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Trusted Research Environments: Analysis of Characteristics and Data Availability - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/9cc5d70e-5668-54dc-ad59-0d7d3f6ceadb
    Explore at:
    Dataset updated
    Mar 28, 2024
    Description

    Trusted Research Environments (TREs) enable analysis of sensitive data under strict security assertions that protect the data with technical organizational and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks & their slight technical variations. To shine light on these problems, we give an overview of existing, publicly described TREs and a bibliography linking to the system description. We further analyze their technical characteristics, especially in their commonalities & variations and provide insight on their data type characteristics and availability. Our literature study shows that 47 TREs worldwide provide access to sensitive data of which two-thirds provide data themselves, predominantly via secure remote access. Statistical offices make available a majority of available sensitive data records included in this study. Methodology We performed a literature study covering 47 TREs worldwide using scholarly databases (Scopus, Web of Science, IEEE Xplore, Science Direct), a computer science library (dblp.org), Google and grey literature focusing on retrieving the following source material: Peer-reviewed articles where available, TRE websites, TRE metadata catalogs. The goal for this literature study is to discover existing TREs, analyze their characteristics and data availability to give an overview on available infrastructure for sensitive data research as many European initiatives have been emerging in recent months. Technical details This dataset consists of five comma-separated values (.csv) files describing our inventory: countries.csv: Table of countries with columns id (number), name (text) and code (text, in ISO 3166-A3 encoding, optional) tres.csv: Table of TREs with columns id (number), name (text), countryid (number, refering to column id of table countries), structureddata (bool, optional), datalevel (one of [1=de-identified, 2=pseudonomized, 3=anonymized], optional), outputcontrol (bool, optional), inceptionyear (date, optional), records (number, optional), datatype (one of [1=claims, 2=linked records]), optional), statistics_office (bool), size (number, optional), source (text, optional), comment (text, optional) access.csv: Table of access modes of TREs with columns id (number), suf (bool, optional), physical_visit (bool, optional), external_physical_visit (bool, optional), remote_visit (bool, optional) inclusion.csv: Table of included TREs into the literature study with columns id (number), included (bool), exclusion reason (one of [peer review, environment, duplicate], optional), comment (text, optional) major_fields.csv: Table of data categorization into the major research fields with columns id (number), life_sciences (bool, optional), physical_sciences (bool, optional), arts_and_humanities (bool, optional), social_sciences (bool, optional). Additionally, a MariaDB (10.5 or higher) schema definition .sql file is needed, properly modelling the schema for databases: schema.sql: Schema definition file to create the tables and views used in the analysis. The analysis was done through Jupyter Notebook which can be found in our source code repository: https://gitlab.tuwien.ac.at/martin.weise/tres/-/blob/master/analysis.ipynb

  9. f

    Data file 5 - Sobek analysis summary

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jul 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BAURAIN, Denis; Van Vlierberghe, Mick (2021). Data file 5 - Sobek analysis summary [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000821105
    Explore at:
    Dataset updated
    Jul 21, 2021
    Authors
    BAURAIN, Denis; Van Vlierberghe, Mick
    Description

    Sobek summary table.- total.contigs: total number of contigs in transcriptome- never.suspected: number of transcripts that were never suspected of being a cross-contamination- nb.suspects: number of transcripts that were suspected of being a cross-contamination- nb.clean: number of transcripts whose origin is from the focal sample- nb.lowcov: number of transcripts whose expression levels are too low in all samples- nb.overexp: number of transcripts whose expression levels are very high in at least 3 samples (often reflect highly conserved genes such as ribosomal gene, or external contamination shared by several samples)- nb.dubious: number of transcripts whose expression levels are too close between focal and alien samples to determine the true origin of the transcript- nb.contam: number of transcripts whose origin is from an alien sample of the same experiment

  10. Technical Summary of the Working Group I Contribution to the IPCC Sixth...

    • catalogue.ceda.ac.uk
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher D. Jones; Charles Koven; Zebedee Nicholls; Spencer Liddicoat; Malte Meinshausen; Jared Lewis (2024). Technical Summary of the Working Group I Contribution to the IPCC Sixth Assessment Report - Input data for Box TS.5, Figure 1 (v20220817) [Dataset]. https://catalogue.ceda.ac.uk/uuid/d6a301f3429b44e7924296f840f68fe6
    Explore at:
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Christopher D. Jones; Charles Koven; Zebedee Nicholls; Spencer Liddicoat; Malte Meinshausen; Jared Lewis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1990 - Dec 31, 2100
    Area covered
    Earth
    Description

    Input Data for Box TS.5, Figure 1 from the Technical Summary of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Box TS.5, Figure1 shows the carbon cycle sensitivity to forcings, future CO2 emissions pathways and the associated sinks and sink fractions.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates: Arias, P.A., N. Bellouin, E. Coppola, R.G. Jones, G. Krinner, J. Marotzke, V. Naik, M.D. Palmer, G.-K. Plattner, J. Rogelj, M. Rojas, J. Sillmann, T. Storelvmo, P.W. Thorne, B. Trewin, K. Achuta Rao, B. Adhikary, R.P. Allan, K. Armour, G. Bala, R. Barimalala, S. Berger, J.G. Canadell, C. Cassou, A. Cherchi, W. Collins, W.D. Collins, S.L. Connors, S. Corti, F. Cruz, F.J. Dentener, C. Dereczynski, A. Di Luca, A. Diongue Niang, F.J. Doblas-Reyes, A. Dosio, H. Douville, F. Engelbrecht, V. Eyring, E. Fischer, P. Forster, B. Fox-Kemper, J.S. Fuglestvedt, J.C. Fyfe, N.P. Gillett, L. Goldfarb, I. Gorodetskaya, J.M. Gutierrez, R. Hamdi, E. Hawkins, H.T. Hewitt, P. Hope, A.S. Islam, C. Jones, D.S. Kaufman, R.E. Kopp, Y. Kosaka, J. Kossin, S. Krakovska, J.-Y. Lee, J. Li, T. Mauritsen, T.K. Maycock, M. Meinshausen, S.-K. Min, P.M.S. Monteiro, T. Ngo-Duc, F. Otto, I. Pinto, A. Pirani, K. Raghavan, R. Ranasinghe, A.C. Ruane, L. Ruiz, J.-B. Sallée, B.H. Samset, S. Sathyendranath, S.I. Seneviratne, A.A. Sörensson, S. Szopa, I. Takayabu, A.-M. Tréguier, B. van den Hurk, R. Vautard, K. von Schuckmann, S. Zaehle, X. Zhang, and K. Zickfeld, 2021: Technical Summary. In Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 33−144, doi:10.1017/9781009157896.002.

    Figure subpanels

    The figure has 7 panels with data provided for all of them. A single python script plots the entire figure.

    List of data provided

    This dataset contains: - Carbon cycle feedback sensitivities from the CMIP6 models - CO2 concentration data used as input to the concentration driven historical and ssp simulations - CO2 concentrations from CMIP6 models from the emissions-driven SSP5-8.5 simulation - CO2 concentrations from MAGICC 7.5.1 model for the other scenarios - Carbon fluxes and derived emissions from the CMIP6 models up to 2100 - Carbon fluxes from the CMIP6 models up to 2300

    Data provided in relation to figure

    Data provided in relation to Box TS.5, Figure 1:

    carbon cycle feedback sensitivities from the CMIP6 models (courtesy Charlie Koven): - carbon_feedback_parameters.nc

    Sub-directory CMIP_data:

    CO2 concentration data used as input to the concentration driven historical and ssp simulations: - CMIP6_HIST_CO2.dat - CMIP6_SSP2300_CO2.dat - CMIP6_SSP_CO2.dat

    CO2 concentrations from CMIP6 models from the emissions-driven SSP5-8.5 simulation: -CMIP6_e-CO2.dat

    Carbon fluxes from the CMIP6 models up to 2300: - CESM2-WACCM_fgco2.dat - CESM2-WACCM_nbp.dat - CanESM5_fgco2.dat - CanESM5_nbp.dat - IPSL-CM6A-LR_fgco2.dat - IPSL-CM6A-LR_nbp.dat - UKESM1-0-LL_fgco2.dat - UKESM1-0-LL_nbp.dat

    Sub-directory MAGIC_data: CO2 concentrations from MAGICC 7.5.1 model for the other scenarios (courtesy Zeb Nicholls and MAGICC team): - MAGICCv7.5.1_atmospheric-co2_esm-ssp119.nc - MAGICCv7.5.1_atmospheric-co2_esm-ssp126.nc - MAGICCv7.5.1_atmospheric-co2_esm-ssp245.nc - MAGICCv7.5.1_atmospheric-co2_esm-ssp370.nc - MAGICCv7.5.1_atmospheric-co2_esm-ssp534-over.nc - MAGICCv7.5.1_atmospheric-co2_esm-ssp585.nc MAGICC is maintained and developed by Malte Meinshausen, Jared Lewis and Zebedee Nicholls. If you have any questions about MAGICC's output or would like to use it in a publication, please contact Malte Meinshausen, Jared Lewis and Zebedee Nicholls. The setup used to generate this data is described extensively in Cross-chapter Box 7.1 and is based on Meinshausen et al. 2009, [2011](doi:... For full abstract see: https://catalogue.ceda.ac.uk/uuid/d6a301f3429b44e7924296f840f68fe6.

  11. Summary for Policymakers of the Working Group I Contribution to the IPCC...

    • catalogue.ceda.ac.uk
    • data-search.nerc.ac.uk
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josep Canadell; Peter Cox; Christopher Jones; Charles Koven; Pedro Monteiro; Joeri Rogelj; Sönke Zaehle; Kirsten Zickfeld (2024). Summary for Policymakers of the Working Group I Contribution to the IPCC Sixth Assessment Report - data for Figure SPM.7 (v20210809) [Dataset]. https://catalogue.ceda.ac.uk/uuid/b1ad4c02319b438884a72fea34cb5a18
    Explore at:
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Centre for Environmental Data Analysishttp://www.ceda.ac.uk/
    Authors
    Josep Canadell; Peter Cox; Christopher Jones; Charles Koven; Pedro Monteiro; Joeri Rogelj; Sönke Zaehle; Kirsten Zickfeld
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1850 - Dec 31, 2100
    Area covered
    Earth
    Description

    Data for Figure SPM.7 from the Summary for Policymakers (SPM) of the Working Group I (WGI) Contribution to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6).

    Figure SPM.7 shows the cumulative anthropogenic CO2 emissions taken up by land and ocean sinks by 2100 under the five core scenarios.

    How to cite this dataset

    When citing this dataset, please include both the data citation below (under 'Citable as') and the following citation for the report component from which the figure originates:

    IPCC, 2021: Summary for Policymakers. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S.L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M.I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J.B.R. Matthews, T.K. Maycock, T. Waterfield, O. Yelekçi, R. Yu, and B. Zhou (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, pp. 3−32, doi:10.1017/9781009157896.001.

    List of data provided

    This dataset contains cumulative anthropogenic (human-caused) carbon dioxide (CO2) emissions taken up by the land and ocean sinks under the five core scenarios (SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0 and SSP5-8.5), simulated from 1850 to 2100 by Earth System Models that contributed to the sixth phase of the Coupled Model Intercomparison Project (CMIP6).

    The five illustrative SSP (Shared Socio-economic Pathway) scenarios are described in Box SPM.1 of the Summary for Policymakers and Section 1.6.1.1 of Chapter 1.

    Data provided in relation to figure

    Data file: SPM7_data.csv: each column corresponds to a single scenario, in which rows 2-7 are the bar values, rows 8-10 are the pie chart values and row 11 is the central value in the pie chart.

    Sources of additional information

    The following weblink is provided in the Related Documents section of this catalogue record: - Link to the report webpage, which includes the report component containing the figure (Summary for Policymakers).

  12. Z

    LFM2b Lyrics Descriptor Analyses

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Skowron (2024). LFM2b Lyrics Descriptor Analyses [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7740044
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Maximilian Mayerl
    Stefan Brandl
    Eva Zangerle
    Markus Schedl
    Emilia Parada-Cabaleiro
    Marcin Skowron
    Elisabeth Lex
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LFM2b Lyrics Descriptor Analyses

    This dataset provides lyrics descriptors for 580,000 songs, including lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors. This dataset was the basis of an analysis of the evolution of song lyrics over the course of five decades and five genres (pop, rock, rap, country, and R&B).

    Dataset Generation As a basis for the dataset, we relied on the LFM-2b dataset (http://www.cp.jku.at/datasets/LFM-2b) of listening events by Last.fm. It contains more than two billion listening records, and more than fifty million songs by more than five million artists. We enrich the dataset with information about songs' release year, genre, lyrics, and popularity information. For quantifying the popularity of tracks and lyrics, we distinguish between the listening count, i.e., the number of listening events in the LFM-2b dataset, and lyrics view count, i.e., the number of views of lyrics on the Genius platform (https://genius.com). Release years, genre information, and lyrics are obtained from the Genius platform. Genres are expressed by one primary genre. We used https://polyglot.readthedocs.io/ to automatically infer the language of the lyrics and considered only English lyrics. Adopting this procedure, we ultimately obtain complete information for 582,759 songs.

    Data and Features We provide the full dataset, containing features for 582,759 songs (full_dataset.json.gz). For each song, the dataset contains track title and artist information, genre, popularity, and release date information, and a wide variety of lexical, structural, diversity-related, readability, rhyme, structural, and emotional descriptors.

    For further information on the semantics of the features, we provided a short overview in the following. Please check the implementation of the feature extractor at https://github.com/MaximilianMayerl/CorrelatesOfSongLyrics/ for further details.

    • Track and artist
    • Genre
    • Popularity descriptors:
      • Lyrics view count
      • Last.fm playcount
    • Lexical descriptors:
      • Line counts: Total number of lines, blank lines, unique lines, ratio of blank and repeated lines
      • Token counts: Number of tokens, characters, repeated token ratio, unique tokens per line, and avg. tokens per line
      • Character counts: Number of \texttt{[!?.,:;"-()]} (total amount of these characters and individual counts per character) and digits, ratio of punctuation and digits
      • Token length: Average length of tokens
      • n-gram ratios: Ratio of unique bigrams and trigrams
      • Legomenon ratios: Ratio of hapax legomena, dis legomena and tris legomena
      • Parts of speech: Frequency of adjectives, adverbs, nouns, pronouns, verbs
      • Past tense: Ratio of verbs in past tense to other verbs
      • Stop words: Number and ratio of stop words, stop words per line
      • Uncommon words: Number of uncommon words (i.e., words not contained WordNet)
    • Diversity descriptors
      • Compression ratio: Ratio of the size of zlib compressed lyrics vs the original lyrics
      • Diversity measures: Measure of Textual Lexical Diversity (MTLD), Herdan's C, Summer's S, Dugast's U^2, and Maas' a^2
    • Readability Descriptors
      • Readability formulas: Flesch Reading Ease, Flesch Kincaid Grade, SMOG (Simple Measure of Gobbledygook), Automated Readability Index, Coleman Liau Index, Dale Chall Readability Score, Linsear Write Formula, Gunning Fog, Fernandez Huerta, Szigriszt Pazos and Gutierrez Polini
      • Difficult words: Number of difficult words (three or more syllables)
    • Rhyme Descriptors
      • Rhyme structures: Numbers of couplets, clerihews, alternating rhymes and nested rhymes
      • Rhyme words: Number of unique rhyming words, percentage of rhyming lines in the lyrics
      • Alliterations: Number of alliterations of length two, three, and four or more
    • Structural Descriptors
      • Element counts: Number of sections and verses
      • Distribution: Relation between the number of verses vs. sections and number of choruses vs sections
      • Title occurrences: Number of times the song's title appears
      • Pattern: Verse and chorus alternating, two verses and at least one chorus, two choruses and at least one verse
      • Start: Starts with chorus (binary attribute)
      • Ending: Ends with two chorus repetitions (binary attribute)
    • Emotional/Affective Descriptors
      • Sentiment scores: Positivity and negativity scores via AFINN, the sentiment lexicon by Bing Liu et al., the MPQA opinion corpus, the sentiment140 dataset, and the SentiWordNetlexicon
      • NRC: Emotion scores according to the NRC affect intensity lexicon
      • LIWC: Descriptors provided by LIWC
      • Happiness: Happiness score according to labMT
  13. Football Players Data

    • kaggle.com
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masood Ahmed (2023). Football Players Data [Dataset]. http://doi.org/10.34740/kaggle/dsv/6960429
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Masood Ahmed
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Description:

    This comprehensive dataset offers detailed information on approximately 17,000 FIFA football players, meticulously scraped from SoFIFA.com.

    It encompasses a wide array of player-specific data points, including but not limited to player names, nationalities, clubs, player ratings, potential, positions, ages, and various skill attributes. This dataset is ideal for football enthusiasts, data analysts, and researchers seeking to conduct in-depth analysis, statistical studies, or machine learning projects related to football players' performance, characteristics, and career progressions.

    Features:

    • name: Name of the player.
    • full_name: Full name of the player.
    • birth_date: Date of birth of the player.
    • age: Age of the player.
    • height_cm: Player's height in centimeters.
    • weight_kgs: Player's weight in kilograms.
    • positions: Positions the player can play.
    • nationality: Player's nationality.
    • overall_rating: Overall rating of the player in FIFA.
    • potential: Potential rating of the player in FIFA.
    • value_euro: Market value of the player in euros.
    • wage_euro: Weekly wage of the player in euros.
    • preferred_foot: Player's preferred foot.
    • international_reputation(1-5): International reputation rating from 1 to 5.
    • weak_foot(1-5): Rating of the player's weaker foot from 1 to 5.
    • skill_moves(1-5): Skill moves rating from 1 to 5.
    • body_type: Player's body type.
    • release_clause_euro: Release clause of the player in euros.
    • national_team: National team of the player.
    • national_rating: Rating in the national team.
    • national_team_position: Position in the national team.
    • national_jersey_number: Jersey number in the national team.
    • crossing: Rating for crossing ability.
    • finishing: Rating for finishing ability.
    • heading_accuracy: Rating for heading accuracy.
    • short_passing: Rating for short passing ability.
    • volleys: Rating for volleys.
    • dribbling: Rating for dribbling.
    • curve: Rating for curve shots.
    • freekick_accuracy: Rating for free kick accuracy.
    • long_passing: Rating for long passing.
    • ball_control: Rating for ball control.
    • acceleration: Rating for acceleration.
    • sprint_speed: Rating for sprint speed.
    • agility: Rating for agility.
    • reactions: Rating for reactions.
    • balance: Rating for balance.
    • shot_power: Rating for shot power.
    • jumping: Rating for jumping.
    • stamina: Rating for stamina.
    • strength: Rating for strength.
    • long_shots: Rating for long shots.
    • aggression: Rating for aggression.
    • interceptions: Rating for interceptions.
    • positioning: Rating for positioning.
    • vision: Rating for vision.
    • penalties: Rating for penalties.
    • composure: Rating for composure.
    • marking: Rating for marking.
    • standing_tackle: Rating for standing tackle.
    • sliding_tackle: Rating for sliding tackle.

    Use Case:

    This dataset is ideal for data analysis, predictive modeling, and machine learning projects. It can be used for:

    • Player performance analysis and comparison.
    • Market value assessment and wage prediction.
    • Team composition and strategy planning.
    • Machine learning models to predict future player potential and career trajectories.

    Note:

    Please ensure to adhere to the terms of service of SoFIFA.com and relevant data protection laws when using this dataset. The dataset is intended for educational and research purposes only and should not be used for commercial gains without proper authorization.

  14. Gym Members Exercise Dataset

    • kaggle.com
    Updated Oct 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vala khorasani (2024). Gym Members Exercise Dataset [Dataset]. https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    vala khorasani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a detailed overview of gym members' exercise routines, physical attributes, and fitness metrics. It contains 973 samples of gym data, including key performance indicators such as heart rate, calories burned, and workout duration. Each entry also includes demographic data and experience levels, allowing for comprehensive analysis of fitness patterns, athlete progression, and health trends.

    Key Features:

    • Age: Age of the gym member.
    • Gender: Gender of the gym member (Male or Female).
    • Weight (kg): Member’s weight in kilograms.
    • Height (m): Member’s height in meters.
    • Max_BPM: Maximum heart rate (beats per minute) during workout sessions.
    • Avg_BPM: Average heart rate during workout sessions.
    • Resting_BPM: Heart rate at rest before workout.
    • Session_Duration (hours): Duration of each workout session in hours.
    • Calories_Burned: Total calories burned during each session.
    • Workout_Type: Type of workout performed (e.g., Cardio, Strength, Yoga, HIIT).
    • Fat_Percentage: Body fat percentage of the member.
    • Water_Intake (liters): Daily water intake during workouts.
    • Workout_Frequency (days/week): Number of workout sessions per week.
    • Experience_Level: Level of experience, from beginner (1) to expert (3).
    • BMI: Body Mass Index, calculated from height and weight.

    This dataset is ideal for data scientists, health researchers, and fitness enthusiasts interested in studying exercise habits, modeling fitness progression, or analyzing the relationship between demographic and physiological data. With a wide range of variables, it offers insights into how different factors affect workout intensity, endurance, and overall health.

  15. Constitution Dataset

    • kaggle.com
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2023). Constitution Dataset [Dataset]. https://www.kaggle.com/prasad22/constitution-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Data is sourced from Comparative Constitutions Project (CCP). This dataset is useful for exploratory data analysis and NLP practices.

    Content

    Scope — This is drawn from Elkins, Ginsburg and Melton, The Endurance of National Constitutions (Cambridge University Press, 2009). It measures the percentage of 701 major topics from the CCP survey that are included in any given constitution.

    Length (in Words) — This is simply a report of the total number of words in the Constitution as measured by Microsoft Word.

    Executive Power— This is an additive index drawn from a working paper, Constitutional Constraints on Executive Lawmaking. The index ranges from 0-7 and captures the presence or absence of seven important aspects of executive lawmaking: (1) the power to initiate legislation; (2) the power to issue decrees; (3) the power to initiate constitutional amendments; (4) the power to declare states of emergency; (5) veto power; (6) the power to challenge the constitutionality of legislation; and (7) the power to dissolve the legislature.

    The index score indicates the total number of these powers given to any national executive (president, prime minister, or assigned to the government) as a whole.

    Legislative Power— This captures the formal degree of power assigned to the legislature by the Constitution. The indicator is drawn from Elkins, Ginsburg and Melton, The Endurance of National Constitutions (Cambridge University Press, 2009), in which we created a set of binary CCP variables to match the 32-item survey developed by M. Steven Fish and Mathew Kroenig in The Handbook of National Legislatures: A Global Survey (Cambridge University Press, 2009). The index score is simply the mean of the 32 binary elements, with higher numbers indicating more legislative power and lower numbers indicating less legislative power.

    Judicial Independence — This index is drawn from a paper by Ginsburg and Melton, Does De Jure Judicial Independence Really Matter? A Reevaluation of Explanations for Judicial Independence. It is an additive index ranging from 0-6 that captures the constitutional presence or absence of six features thought to enhance judicial independence. The six features are: (1) whether the constitution contains an explicit statement of judicial independence; (2) whether the constitution provides that judges have lifetime appointments; (3) whether appointments to the highest court involve either a judicial council or two (or more) actors; (4) whether removal is prohibited or limited so that it requires the proposal of a supermajority vote in the legislature, or if only the public or judicial council can propose removal and another political actor is required to approve such a proposal; (5) whether removal explicitly limited to crimes and other issues of misconduct, treason, or violations of the constitution; and (6) whether judicial salaries are protected from reduction.

    Number of Rights — In our ongoing book project on human rights, we analyze a set of 1172 different rights found in national constitutions. The rights index indicates the number of these rights found in any particular constitution.

    Preamble - This is something I have extracted from the platform itself. It has the textual content of the preamble of every nation's Constitution.

  16. Telemarketing JYB Dataset - UCI

    • kaggle.com
    Updated Dec 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Aguado (2022). Telemarketing JYB Dataset - UCI [Dataset]. https://www.kaggle.com/datasets/aguado/telemarketing-jyb-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Víctor Aguado
    Description

    Description

    The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

    Input variables:

    Bank client data 1 - age (numeric) 2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown') 3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed) 4 - education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown') 5 - default: has credit in default? (categorical: 'no','yes','unknown') 6 - housing: has housing loan? (categorical: 'no','yes','unknown') 7 - loan: has personal loan? (categorical: 'no','yes','unknown')

    related with the last contact of the current campaign:

    8 - contact: contact communication type (categorical: 'cellular','telephone') 9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec') 10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri') 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. Other attributes 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

    social and economic context attributes

    16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric)

    Output variable 21 - y - has the client subscribed a term deposit? (binary: 'yes','no')

  17. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dongre Laxman (2024). Amazon Reviews Dataset [Dataset]. https://www.kaggle.com/datasets/dongrelaxman/amazon-reviews-dataset
Organization logo

Amazon Reviews Dataset

A Comprehensive Review Dataset for E-Commerce Analysis

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dongre Laxman
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset comprises customer reviews for Amazon, an online retail giant, featuring insights into customer experiences, including ratings, review titles, texts, and metadata. It is valuable for analyzing customer satisfaction, sentiment, and trends.

Column Descriptions:

Reviewer Name: Identifies the reviewer. Profile Link: Links to the reviewer's profile for additional insights. Country: Indicates the reviewer's location. Review Count: Number of reviews by the same user, showing engagement level. Review Date: When the review was posted, useful for time analysis. Rating: Numerical satisfaction measure. Review Title: Summarizes the review sentiment. Review Text: Detailed customer feedback. Date of Experience: When the service/product was experienced.

Prospective applications:

Sentiment Analysis: Analyze review texts and titles to assess overall customer sentiment toward products, enabling the identification of strengths and weaknesses. Customer Satisfaction Tracking: Track and visualize rating trends over time to understand fluctuations in customer satisfaction. Product Improvement: Identify common themes in reviews to highlight areas for product enhancement or development. Market Segmentation: Use country and demographic information to customize marketing strategies and gain insights into regional preferences. Competitor Analysis: Evaluate customer feedback on Amazon products in comparison to competitors to determine market positioning. Recommendation Systems: Leverage review data to enhance recommendation algorithms, improving personalized shopping experiences. Trend Analysis: Investigate temporal patterns in reviews to link sentiment changes with marketing efforts or product launches.

This extensive dataset serves as a valuable asset for various analyses focused on enhancing customer engagement and refining business strategies.

Search
Clear search
Close search
Google apps
Main menu