Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for "amazon-product-data-filter"
Dataset Summary
The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more.
Languages
The text in the dataset is in English.
Dataset Structure
Data Instances
Each data point provides product information, such… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-filter.
grudgie/amazon-appliances-data-subset dataset hosted on Hugging Face and contributed by the HF Datasets community
https://www.marketresearchstore.com/privacy-statementhttps://www.marketresearchstore.com/privacy-statement
[Keywords] Market include TouchIT Technologies, UCView, Mvix, AVI Systems, Samsung Electronics
recmeapp/Amazon-beauty dataset hosted on Hugging Face and contributed by the HF Datasets community
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Study-1-data and Study-2-data comprise responses to the Mini-K (Figueredo et al. 2006). The Study 1 participants were Amazon's Mechanincal Turk workers. The Study 2 participants were undergraduates at Oklahoma State University. Study-3-data comprises reponses to the K-SF-42 (Figueredo et al. 2017). Participants were Amazon's Mechanincal Turk workers. See the paper for additional information.
R-code contains the code used to run the network analyses described in the paper. "Datafile" represents the file name of the data set being analyzed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The timeseries constitutes two different types of AWS. The first is a standard type modular AWS described extensively in https://doi.org/10.1080/15230430.2017.1420954.
The second type (operational from Aug2016 at AWS5, Aug2015 at AWS6, Aug2014 at AWS9, Aug2014 at AWS10) is a very compact IMAU design AWS consisting of one integrated module containing the datalogger, energy system and multiple sensors. In addition to the datalogger unit there are also 2 independent sensors dedicated to wind speed/direction and radiation (a Young prop/vane, CNR4 radiation sensor). All three units are mounted 3 to 4m above the surface at one mast boom. […]
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sediment and pore water samples were collected during the M147 cruise of Research Vessel Meteor in April and May 2018. Additional sediment samples (GeoB 4417-5 and GeoB 4409-2) were collected during the M38-2 cruise in March 1997. Total element concentrations (Fe, Al, K) of the solid phase were measured after acid digestion (HF, HNO3 and HClO4) by inductively coupled plasma optical emission spectrometry (Varian ICP 720-ES). Solid phase iron speciation data were measured following single step sodium dithionite extraction (FeD) or sequential Fe extraction (FeAc, FeDith, FeOxal) by inductively coupled plasma optical emission spectrometry (Varian ICP 720-ES). Solid phase pyrite concentrations (FePy) were calculated stoichiometrically from photometrically measured S2- released via chromium(II) chloride reduction. Total organic carbon (TOC) of the sediment samples was measured in an Elemental Analyzer (Euro EA). Prior to analysis carbon bound to carbonate minerals was removed by leaching the sediment with 0.25 N HCl. Pore water nitrate concentrations were measured on board with a SEAL QuAAtro continuous flow auto analyzer. Pore water samples for dissolved element analysis were acidified with HCl to pH < 2 after sampling. Depending on the concentration range, pore water K and Fe was measured by inductively coupled plasma optical emission spectrometry (Varian 720 ES) or inductively coupled plasma mass spectrometry (Agilent 7500).
Amazon Review is a dataset to tackle the task of identifying whether the sentiment of a product review is positive or negative. This dataset includes reviews from four different merchandise categories: Books (B) (2834 samples), DVDs (D) (1199 samples), Electronics (E) (1883 samples), and Kitchen and housewares (K) (1755 samples).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Creation and Processing Overview
This dataset underwent a comprehensive process of loading, cleaning, processing, and preparing, incorporating a range of data manipulation and NLP techniques to optimize its utility for machine learning models, particularly in natural language processing.
Data Loading and Initial Cleaning
Source: Loaded from the Hugging Face dataset repository bprateek/amazon_product_description. Conversion to Pandas DataFrame: For ease of data… See the full description on the dataset page: https://huggingface.co/datasets/ckandemir/amazon-products.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Amazon Review Description Dataset
This dataset contains Amazon reviews from January 1, 2018, to June 30, 2018. It includes 2,245 sequences with 127,054 events across 18 category types. The original data is available at Amazon Review Data with citation information provided on the page. The detailed data preprocessing steps used to create this dataset can be found in the TPP-LLM paper and TPP-LLM-Embedding paper. If you find this dataset useful, we kindly invite you to cite the… See the full description on the dataset page: https://huggingface.co/datasets/tppllm/amazon-review-description.
The data consists of litterfall production in a fertilised old growth forest in Central Amazon. Data was collected in a full factorial nutrient addition experiment (nitrogen, phosphorus and cation treatments). Within each plot we have installed five litter traps of 50 cm x 50 cm, 1 m above ground, occupying an area of 1.25 m2 per plot, and ensuring litter reaching the trap was produced within the experimental plot area. The study was funded by NERC, BDFFP (logistical support) and the Brazilian government (students scholarship).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains a collection of Amazon headphone reviews, processed for sentiment analysis. It is a small subset intended to assist in understanding customer opinions and evaluating product perceptions. The data supports analysis of review usefulness, factors influencing helpfulness, and the detection of atypical or potentially misleading reviews.
This dataset is typically provided in a CSV file format. It comprises approximately 1,500 individual reviews. The structure includes 6 distinct columns, making it readily available for analytical tasks.
This dataset is ideally suited for: * Conducting sentiment analysis on product reviews. * Exploring factors that influence the perceived helpfulness of a review. * Identifying unusual review patterns or potential outliers. * Applications in Natural Language Processing (NLP), text mining, and exploratory data analysis.
The data spans a time range from 28 May 2021 to 13 June 2022. It covers various customer names, including "Amazon Customer" and "Rahul", alongside a large proportion of "Other" customers. Product colours predominantly include "White" and "Black". The ratings are distributed across several ranges, from 1.00-1.40 up to 4.60-5.00. The geographical scope of the data is global.
CCO
This dataset is beneficial for data scientists, machine learning engineers, business analysts, and researchers interested in: * Developing sentiment analysis models. * Understanding consumer feedback and product performance. * Performing text-based data analysis. * Exploring e-commerce review patterns.
Original Data Source: HEADPHONE DATASET REVIEW ANALYSIS
The total number of Amazon Alexa skills continues to grow at a steady pace in selected countries. As of ************, the skill count for Amazon Alexa has grown to ****** in the United States. The most noticeable jump in the number of skills was noticed in Spain at ****** with the last year just at *****.
Dataset Card for Amazon QA
This dataset is a collection of question-answer pairs collected from Amazon QA. See Amazon QA for additional information. This dataset can be used directly with Sentence Transformers to train embedding models.
Dataset Subsets
pair subset
Columns: "query", "answer" Column types: str, str Examples:{ 'query': 'What size are the tiles and how thick and what material?', 'answer': 'Tiles are 12" x 12", about 1/2 inch thick and made of… See the full description on the dataset page: https://huggingface.co/datasets/sentence-transformers/amazon-qa.
This data set provides measurements of carbon dioxide flux rates (FCO2), gas transfer velocity (k), and partial pressures (pCO2) at 75 sites on rivers and streams of the Amazon River system in South America for the period beginning July 1, 2004, and ending January 23, 2007. Several fieldwork campaigns occurred between June 2004 and January 2007 in the Amazon River basin, with discharge conditions ranging from low to high flow. The sampled areas span the spectrum of chemical characteristics observed across the entire basin, including, for example, both low and high pH values and suspended sediment loads. There is one comma-delimited data file in this data set.
Cold waves crossing the Amazon rainforest are a rare phenomenon predicted to increase in intensity under climate change. We here describe an extensive cold wave occurring in June 2023 in Amazonian-Andean forests, compared environmental temperatures to experimentally tested thermal tolerances and its impact on lowland animal communities (insects and wild mammals). While we found strong reductions in abundance of all animal groups under the cold wave, tropical lowland animals showed thermal tolerance limits below the lowest environmental temperatures measured during the cold wave, and abundances of most studied taxa recovered over the next season; nevertheless, small thermal safety margins suggest that an increased intensity of cold waves in the future could imperil animal communities in the Amazon., Temperature data Air temperature was measured at each plot along the elevation gradient with iButton sensors (Analog Devices, Inc, Wilmington, USA) at 1.5 m height every four hours, hanging from a horizontal branch. The sensors were shielded with white plastic dishes (diameter ca. 18 cm) to protect them from rain and direct sunlight [13]. In addition, each plot was equipped with a TOMST4-temperature and soil humidity logger (TOMST s.r.o., Prague, Czech Republic), continuously measuring temperature and soil humidity at 6 cm depth, as well as temperature at the surface and in 15 cm height. Insect data (1) Malaise: At each plot in each field season, one malaise trap was operated for seven days. Malaise traps were based on the Townes Malaise trap model, albeit with a black roof and a slightly smaller size (dimensions of the capture area: height front: 0.90 m; height rear: 0.60 m; length: 1.60 m); Ethanol (96 %) was used as the capture fluid to ensure the preservation of specimens. For each ..., , # Cold waves in the Amazon rainforest and their ecological impact
https://doi.org/10.5061/dryad.ns1rn8q31
Climate and biodiversity were monitored at three study locations ("plots") in the Peruvian rainforest. We used thermal sensors, pitfall traps, malaise traps, manual netting and camera traps. Thermal tolerance experiments were conducted with a programmable thermoblock.
Description:Â Malaise traps for community biomass of mainly flying insects
Description:Â temperature measured by iButton loggers
Tmean: daily mean temperature
Tmin: daily minimum temperature
Tmax:...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The distribution of tribe by tooth type in the data set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This collection contains 21 hour-long soundscape recordings, which have been annotated with 14,798 bounding box labels for 132 different bird species from the Southwestern Amazon Basin. The data were recorded in 2019 in the Inkaterra Reserva Amazonica, Madre de Dios, Peru. This collection has partially been featured as test data in the 2020 BirdCLEF competition and can primarily be used for training and evaluation of machine learning algorithms.
Data collection
This acoustic data was collected at the Inkaterra Reserva Amazonica (ITRA) between January 14th and February 2nd, 2019, during the rainy season. ITRA is a 2 km2 lowland rainforest reserve on the banks of the Madre de Dios river, approximately 20 km east of the frontier town of Puerto Maldonado. The region's extraordinary biodiversity is threatened by accelerating rates of deforestation, degradation, and fragmentation, which are driven primarily by expanding road networks, mining, agriculture, and an increasing population. The acoustic data from this site were collected as part of a study designed to assess spatio-temporal variation in avian species richness and vocal activity levels across intact, degraded, and edge forest, and between different days at the same point locations.
Ten SWIFT recording units, provided by the K. Lisa Yang Center for Conservation Bioacoustics at the Cornell Lab of Ornithology, were placed at separate sites spanning edge habitat, degraded forest, and intact forest within the reserve. These omnidirectional recorders were set to record uncompressed WAVE files continuously for the duration of their deployment, with a sampling rate of 48 kHz. The sensitivity of the used microphones was -44 (+/-3) dB re 1 V/Pa. The microphone's frequency response was not measured but is assumed to be flat (+/- 3 dB) in the frequency range 100 Hz to 7.5 kHz. The analog signal was amplified by 35 dB and digitized (16-bit resolution) using an analog-to-digital converter (ADC) with a clipping level of -/+ 0.9 V. For this collection, recordings were resampled at 32 kHz and converted to FLAC. Recorders were placed at a consistent height of approximately 1.5 m above the ground. To minimize background noise, all sites used for data analysis were located at a minimum distance of 450 m from the river.
Sampling and annotation protocol
A total of 21 dawn-hours, from 05:00-06:00 PET (10:00-11:00 UTC), representing 7 of the 10 sites on three randomly-selected dates, were manually annotated. Many neotropical bird species sing almost exclusively during the dawn hour, so this time window was selected to maximize the number of species present in the recordings. A single annotator boxed every bird call he could identify and ignored those that were too faint. Raven Pro software was used to annotate the data. Provided labels contain full bird calls that are boxed in time and frequency. The annotator was allowed to combine multiple consecutive calls of one species into one bounding box label if pauses between calls were shorter than five seconds. In this collection, we use eBird species codes as labels, following the 2021 eBird taxonomy (Clements list). Parts of this dataset have previously been featured in the 2020 BirdCLEF competition.
Files in this collection
Audio recordings can be accessed by downloading and extracting the “soundscape_data.zip” file. Soundscape recording filenames contain a sequential file ID, recording site, date, and timestamp in UTC. As an example, the file “PER_001_S01_20190116_100007Z.flac” has sequential ID 001 and was recorded at site S01 on Jan 16th, 2019 at 10:00:07 UTC. Ground truth annotations are listed in “annotations.csv” where each line specifies the corresponding filename, start and end time in seconds, low and high frequency in Hertz, and an eBird species code. These species codes can be assigned to scientific and common name of a species with the “species.csv” file. Unidentifiable calls have been marked with “????” and are included in the ground truth annotations. The approximate recording location and a short habitat description for all sites can be found in the “recording_location.txt” file.
Acknowledgements
We would like to thank the Inkaterra Association (ITA) staff for providing logistical support and excellent field station facilities, particularly Noe Huaraca, Dennis Osorio, and Kevin Jiménez Gonzales, who helped set up recorders. Noe Huaraca, John Fitzpatrick, Fernando Angulo, Will Sweet, Ken Rosenburg, and Alex Wiebe helped identify unknown vocalizations. Funding for equipment was provided by the K. Lisa Yang Center for Conservation Bioacoustics at the Cornell Lab of Ornithology, with support from Innóvate Perú, CORBIDI, and the Inkaterra Association. Travel expenses were funded by the Cornell Lab of Ornithology.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides a collection of randomly selected customer reviews and ratings for various Amazon products. It comprises nearly 1.6 thousand individual reviews, making it a valuable resource for understanding consumer feedback. The primary aim of using this dataset is to identify the main topics within these reviews, enabling better classification for improved search functionality. It is particularly suited for developing algorithms that can differentiate topics based on a body of review text.
The dataset includes the following fields: * id: A unique identifier for each entry. * asins: The product identification number. * brand: The manufacturer or brand of the product. * categories: The categorisation of the product. * colors: The colour of the product. * dateAdded: The date when the product was first listed or added to the dataset. * dateUpdated: The date when the product's information was last updated. * dimension: The physical dimensions of the product. * ean: The European Article Number (EAN) for the product. * keys: A special assigned key associated with the product.
The dataset contains approximately 1.6 thousand reviews. The data is structured in a tabular format, suitable for analysis. Key distributions observed within the dataset include: * Brands: A significant majority of products (99%) are from Amazon, with a smaller portion (1%) from Moshi. * Categories: A notable 34% of products fall under categories such as Amazon Devices, Smart Home, and Voice Assistants, with another 12% simply categorised as Amazon Devices. Other categories account for 54% of the data. * Colours: About 52% of entries have null values for colour, while 42% are recorded as Black. Other colours make up the remaining 6%. * Dates: The date range for products added or updated spans from 17 January 2015 to 13 August 2017, with varying counts of entries across different periods. * Dimensions: 65% of the entries have null dimensions, while 34% specify a dimension of 4.8 inches by 6.6 inches by 3.2 inches.
This dataset is ideal for a range of applications, including: * Developing and evaluating Topic Modelling Algorithms to categorise customer reviews. * Performing Natural Language Processing (NLP) tasks such as sentiment analysis or keyword extraction from product reviews. * Gaining insights into consumer behaviour and product feedback in the e-commerce sector. * Supporting data clean-up and exploratory data analysis for textual datasets.
The dataset's coverage is global, encompassing reviews from various customers. The time range of the data spans from 17 January 2015 to 13 August 2017. No specific demographic details about the customers are provided.
CCO
This dataset is suitable for: * Data Scientists and Machine Learning Engineers focused on NLP and topic modelling. * Researchers in fields such as e-commerce, consumer studies, and computational linguistics. * Students and beginners in data science looking for a practical dataset for learning and experimentation. * Businesses aiming to understand customer feedback and improve product categorisation.
Original Data Source: Amazon Product Reviews Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT This study aimed to define management zones (MZs) for fertirrigated açaí cultivation, based on spatial variability of the foliar nutrients and productivity data. The work was carried out in an area of 5.75 ha of a 7-year crop, with 80 georeferenced sample points. Fresh fruit productivity and nutrient (N, P, K, Ca, Mg, S, B, Cu, Fe, Mn, and Zn) contents were determined. The average contents of macronutrients were considered adequate for adult açaí plants, and their spatial dependence associated with fruit productivity allowed the representation of their distributions through maps of variability. Through multivariate analysis, three main components were highlighted. These components explained 51.5 % of the total variability of the data, where PC1 showed a higher correlation with Ca, Mg, K, and P. In addition, three MZs were obtained, out of which one with the highest productivity showed the best Ca, Mg, S, B, and Fe leaf contents. Principal component analysis and determination of MZs emphasized Ca and Mg nutrition as being more related to spatial variability and açaí fruit productivity.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dataset Card for "amazon-product-data-filter"
Dataset Summary
The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more.
Languages
The text in the dataset is in English.
Dataset Structure
Data Instances
Each data point provides product information, such… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-filter.