77 datasets found
  1. Amazon dataset for ERS-REFMMF

    • figshare.com
    txt
    Updated Feb 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Teng Chang (2024). Amazon dataset for ERS-REFMMF [Dataset]. http://doi.org/10.6084/m9.figshare.25126313.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 1, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Teng Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recommender systems based on matrix factorization act as black-box models and are unable to explain the recommended items. After adding the neighborhood algorithm, the explainability is measured by the user's neighborhood recommendation, but the subjective explicit preference of the target user is ignored. To better combine the latent factors from matrix factorization and the target user's explicit preferences, an explainable recommender system based on reconstructed explanatory factors and multi-modal matrix factorization (ERS-REFMMF) is proposed. ERS-REFMMF is a two-layer model, and the underlying model decomposes the multi-modal scoring matrix to get the rich latent features of the user and the item based on the method of Funk-SVD, in which the multi-modal scoring matrix consists of the original matrix and the preference features and sentiment scores exhibited by users in the reviews corresponding to the ratings. The set of candidate items is obtained based on the latent features, and the explainability is reconstructed based on the subjective preference of the target user and the real recognition level of the neighbors. The upper layer is the multi-objective high-performance recommendation stage, in which the candidate set is optimized by a multi-objective evolutionary algorithm to bring the user a final recommendation list that is accurate, recallable, diverse, and interpretable, in which the accuracy and recall are represented by F1-measure. Experimental results on three real datasets from Amazon show that the proposed model is competitive compared to existing recommendation methods in both stages.

  2. m

    Land use and land cover map from the Crepori National Forest, northern...

    • data.mendeley.com
    Updated Mar 8, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jackson Simionato (2021). Land use and land cover map from the Crepori National Forest, northern Brazil [Dataset]. http://doi.org/10.17632/zp3gpw8mhn.1
    Explore at:
    Dataset updated
    Mar 8, 2021
    Authors
    Jackson Simionato
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    North Region, Brazil
    Description

    This dataset include an ESRI Shapefile archive containing the land use and land cover classification for the year 2017 from a Brazilian Conservation Unit called Crepori National Forest (NFC). This is the most important result obtained in a study that sought to automatically identify artisanal mining areas in the Amazon Rainforest, using the GEOBIA approach together with data mining techniques.

  3. Amazon Product Listing Dataset

    • kaggle.com
    zip
    Updated Oct 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2021). Amazon Product Listing Dataset [Dataset]. https://www.kaggle.com/promptcloud/amazon-product-listing-dataset
    Explore at:
    zip(4267812 bytes)Available download formats
    Dataset updated
    Oct 12, 2021
    Authors
    PromptCloud
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset was created by our in-house Web Scraping and Data Mining teams at PromptCloud and DataStock. You can download the full dataset here. This sample contains 30K records. You can download the full dataset here

    Content

    Total Records Count : 715945  Domain Name : amazon.com  Date Range : 01st Nov 2020 - 31st Dec 2020   File Extension : csv

    Available Fields : Uniq Id, Crawl Timestamp, Pageurl, Website, Title, Num Of Reviews, Average Rating, Number Of Ratings, Model Num, Sku, Upc, Manufacturer, Model Name, Price, Monthly Price, Stock, Carrier, Color Category, Internal Memory, Screen Size, Specifications, Five Star, Four Star, Three Star, Two Star, One Star, Broken Link, Discontinued 

    Acknowledgements

    We wouldn't be here without the help of our in house web scraping and data mining teams at PromptCloud and DataStock.

    Inspiration

    This dataset was created keeping in mind our data scientists and researchers across the world.

  4. Feature description of the Amazon dataset.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad (2024). Feature description of the Amazon dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294968.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

  5. m

    Amazon Data | Amazon Product Data | Amazon Reviews Data | Seller...

    • apiscrapy.mydatastorefront.com
    Updated Nov 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APISCRAPY (2024). Amazon Data | Amazon Product Data | Amazon Reviews Data | Seller Performance, Product Rankings for Competitive Analysis [Dataset]. https://apiscrapy.mydatastorefront.com/products/apiscrapy-amazon-data-amazon-database-amazon-datasets-amazon-apiscrapy
    Explore at:
    Dataset updated
    Nov 21, 2024
    Dataset authored and provided by
    APISCRAPY
    Area covered
    North Macedonia, Iceland, Slovenia, United Kingdom, Moldova, Albania, Belarus, Switzerland, Åland Islands, Greenland
    Description

    APISCRAPY's Amazon Data extraction is a sophisticated solution that leverages AI & web scraping skills to supply organizations with critical data from the Amazon platform. By scraping Amazon you get a product-related Amazon database, including product names, descriptions, pricing, ratings & reviews

  6. h

    amazon-product-data-filter

    • huggingface.co
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iftach Arbel (2023). amazon-product-data-filter [Dataset]. https://huggingface.co/datasets/iarbel/amazon-product-data-filter
    Explore at:
    Dataset updated
    Nov 14, 2023
    Authors
    Iftach Arbel
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Card for "amazon-product-data-filter"

      Dataset Summary
    

    The Amazon Product Dataset contains product listing data from the Amazon US website. It can be used for various NLP and classification tasks, such as text generation, product type classification, attribute extraction, image recognition and more.

      Languages
    

    The text in the dataset is in English.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    Each data point provides product information, such… See the full description on the dataset page: https://huggingface.co/datasets/iarbel/amazon-product-data-filter.

  7. m

    Ecommerce Market data -Amazon Data , Walmart product data, Ecommerce data |...

    • apiscrapy.mydatastorefront.com
    Updated Nov 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    APISCRAPY (2024). Ecommerce Market data -Amazon Data , Walmart product data, Ecommerce data | Ecommerce data extraction | 50% Cost Saving |Free Sample [Dataset]. https://apiscrapy.mydatastorefront.com/products/apiscrapy-amazon-data-amazon-seller-data-amazon-datasets-50-m-apiscrapy
    Explore at:
    Dataset updated
    Nov 19, 2024
    Dataset authored and provided by
    APISCRAPY
    Area covered
    Belarus, Hungary, Singapore, British Indian Ocean Territory, Romania, Luxembourg, Faroe Islands, Germany, Spain, Estonia
    Description

    Unlock the potential of Ecommerce data scraping and extraction with APISCRAPY. Dive into Amazon data and tap into the vast Ecommerce market's secrets. Stay ahead of the competition by leveraging our powerful tool for comprehensive Ecommerce data insights.

  8. Amazon France Product Details

    • kaggle.com
    zip
    Updated Jul 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PromptCloud (2021). Amazon France Product Details [Dataset]. https://www.kaggle.com/promptcloud/amazon-france-product-details
    Explore at:
    zip(24187833 bytes)Available download formats
    Dataset updated
    Jul 14, 2021
    Authors
    PromptCloud
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset was created by our in-house Web Scraping and Data Mining teams at PromptCloud and DataStock. You can download the full dataset here. This sample contains 30K records. You can download the full dataset here

    Content

    Total Records Count: 375952 Domain Name: amazon.fr Date Range: 01st Jul 2020 - 30th Sep 2020 File Extension: tsv

    Available Fields: Uniq Id, Crawl Timestamp, Dataset Origin, Product Id, Product Barcode, Product Company Type Source, Product Brand Source, Product Brand Normalised Source, Product Name Source, Match Rank, Match Score, Match Type, Retailer, Product Category, Product Brand, Product Name, Product Price, Sku, Upc, Product Url, Market, Product Description, Product Currency, Product Available Inventory, Product Image Url, Product Model Number, Product Tags, Product Contents, Product Rating, Product Reviews Count, Bsr, Joining Key

    Acknowledgements

    We wouldn't be here without the help of our in house web scraping and data mining teams at PromptCloud and DataStock.

    Inspiration

    This dataset was created keeping in mind our data scientists and researchers across the world.

  9. d

    Datahut Amazon Product Data Feeds for North America, South America & Asia...

    • datarade.ai
    Updated Feb 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datahut (2021). Datahut Amazon Product Data Feeds for North America, South America & Asia (data as a service) [Dataset]. https://datarade.ai/data-products/amazon-product-data-feeds-data-as-a-service-datahut
    Explore at:
    .json, .xml, .csv, .xls, .sqlAvailable download formats
    Dataset updated
    Feb 26, 2021
    Dataset authored and provided by
    Datahut
    Area covered
    Chile, Kuwait, Oman, Bahamas, Taiwan, French Guiana, Uruguay, Cambodia, Colombia, Thailand, North America, South America
    Description

    Think about extracting 25 million product information from Amazon everyday?

    Data extraction from Amazon on a large scale is a pain. However - our data as a service platform is capable of performing data extraction at scale without getting blocked by Amazon's anti-scraping technology.

    We are giving our customers access to ready-to-use product data feeds from Amazon on a huge scale. A scale that most web scraping service providers can't even dream about.

  10. d

    Data from: Amazon forests capture high levels of atmospheric mercury...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Feb 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacqueline Gerson; Natalie Szponar; Arianna Agostini; Rand Alotaibi; Bridget Bergquist; Arabella Chen; Luis Fernandez; Kelsey Lansdale; Anne Lee; Maria Machicao; Melissa Marchese; Simon Topp; Claudia Vega; Emily Bernhardt (2022). Amazon forests capture high levels of atmospheric mercury pollution from artisanal gold mining [Dataset]. http://doi.org/10.6078/D1DH6F
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 7, 2022
    Dataset provided by
    Dryad
    Authors
    Jacqueline Gerson; Natalie Szponar; Arianna Agostini; Rand Alotaibi; Bridget Bergquist; Arabella Chen; Luis Fernandez; Kelsey Lansdale; Anne Lee; Maria Machicao; Melissa Marchese; Simon Topp; Claudia Vega; Emily Bernhardt
    Time period covered
    Jan 25, 2022
    Area covered
    Amazon Rainforest
    Description

    Mercury emissions from artisanal and small-scale gold mining throughout the Global South exceed coal combustion as the largest global source of mercury. We examined mercury deposition and storage in an area of the Peruvian Amazon heavily impacted by artisanal gold mining. Intact forests in the Peruvian Amazon near gold mining receive extremely high inputs of mercury and experience elevated total mercury and methylmercury in the atmosphere, canopy foliage, and soils. Here we show for the first time that an intact forest canopy near artisanal gold mining intercepts large amounts of particulate and gaseous mercury, at a rate proportional with total leaf area. We document substantial mercury accumulation in soils, biomass, and resident songbirds in some of the Amazon’s most protected and biodiverse areas, raising important questions about how mercury pollution may constrain modern and future conservation efforts in these tropical ecosystems.

  11. f

    Data from: Diversity of Arbuscular Mycorrhizal Fungi in an Amazon...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Jul 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berbara, Ricardo Luiz Louro; Fornaciari, Ademir Junior; Mendonça, Leticia Pastore; Nobre, Camila Pinheiro; de Oliveira Granha, Jose Rodolfo Dantas; Caproni, Ana Lucy (2018). Diversity of Arbuscular Mycorrhizal Fungi in an Amazon Environment after Mining [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000706725
    Explore at:
    Dataset updated
    Jul 18, 2018
    Authors
    Berbara, Ricardo Luiz Louro; Fornaciari, Ademir Junior; Mendonça, Leticia Pastore; Nobre, Camila Pinheiro; de Oliveira Granha, Jose Rodolfo Dantas; Caproni, Ana Lucy
    Description

    ABSTRACT At the Brazilian Amazon forest, studies were carried out to estimate the community of arbuscular mycorrhizal fungi (AMF), from this it was used a bioassay of dilutions of samples collected from preserved and regenerated areas after bauxite extraction. To regenerate areas, tree species were introduced and samples were taken after 2, 6, 12, and 16 years, the spores obtained were compared to those obtained by direct extraction and the number of species recovered from the bioassay was significantly higher. Therefore, the species founded after different periods regeneration was similar to the ones from the native forest. Since the early years of revegetation, the number of rare species was high with strong dominance of G. macrocarpum. Among older communities this high dominance decreased while at the same time, there was an increase in the number of individuals from other AMF species, concluding that the number of species did not change with the age of the revegetation.

  12. b

    Marketing Bias data

    • berd-platform.de
    txt
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mengting Wan; Jianmo Ni; Rishabh Misra; Julian McAuley; Mengting Wan; Jianmo Ni; Rishabh Misra; Julian McAuley (2025). Marketing Bias data [Dataset]. http://doi.org/10.82939/jp1cd-gne79
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    ACM
    Authors
    Mengting Wan; Jianmo Ni; Rishabh Misra; Julian McAuley; Mengting Wan; Jianmo Ni; Rishabh Misra; Julian McAuley
    License

    https://github.com/MengtingWan/marketBias/blob/master/LICENSEhttps://github.com/MengtingWan/marketBias/blob/master/LICENSE

    Description

    These datasets contain attributes about products sold on ModCloth and of the Electronics category on Amazon which may be sources of bias in recommendations (in particular, attributes about how the products are marketed). Data also includes user/item interactions for recommendation. The dataset includes 99,893 reviews for ModCloth and 1,292,954 reviews for the Electronics category of Amazon.

  13. d

    Tracking gold mining derived mercury pollution into human diets in the Madre...

    • search.dataone.org
    • datadryad.org
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melissa Marchese; Jacqueline Gerson; Axel Berky; Arabella Chen; Charles Driscoll; Luis Fernandez; Heileen Hsu-Kim; Kelsey Lansdale; Anne Lee; Eliza Letourneau; Maria Machicao; Mario Montesdeoca; William Pan; Emily Robie; Claudia Vega; Emily Bernhardt (2025). Tracking gold mining derived mercury pollution into human diets in the Madre de Dios region of Peru [Dataset]. http://doi.org/10.5061/dryad.cnp5hqcbh
    Explore at:
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Melissa Marchese; Jacqueline Gerson; Axel Berky; Arabella Chen; Charles Driscoll; Luis Fernandez; Heileen Hsu-Kim; Kelsey Lansdale; Anne Lee; Eliza Letourneau; Maria Machicao; Mario Montesdeoca; William Pan; Emily Robie; Claudia Vega; Emily Bernhardt
    Area covered
    Madre de Dios, Peru
    Description

    Artisanal and small-scale gold mining (ASGM) is the largest global anthropogenic mercury (Hg) source and is widespread in the Peruvian Amazon. While numerous studies have examined fish Hg content near ASGM, Hg accumulation in other commonly consumed animal- and plant-based foods from terrestrial environments is often overlooked. These data were collected in 2018 and 2019 to understand Hg exposure from food staples in Peru's Madre de Dios region. This dataset contains measurements of total Hg and methyl Hg content in locally sourced crops, fish, chicken meat, chicken feathers, and eggs from ASGM-impacted and upstream reference communities. Stable carbon and nitrogen isotope signatures from fish and chicken were also measured to characterize trophic position and magnification., All data were collected from the Madre de Dios region of the Peruvian Amazon. Sampling sites are described in Marchese et al. (2024), and coordinates can be found in the data file entitled "Site.coordinates.csv". Sampling, laboratory analysis, and quality control methodology are presented in Marchese et al. (2024). In brief, we collected crops, fish muscle, chicken meat, chicken eggs, and chicken feathers from areas heavily and minimally impacted by mining. The data were used to determine how geogenic and anthropogenic Hg accumulates in terrestrial and aquatic organisms that serve as regional food staples. Samples were collected during June through August in 2018 and 2019. These data are predominantly comprised of total Hg measurements for crops, chicken, and fish plus methyl Hg for crops and chicken. Stable carbon and nitrogen isotope data are presented for chicken feathers and fish tissue. Survey data from individuals who contributed chicken samples to the study are included. Coordina..., , # Tracking gold mining derived mercury pollution into human diets in the Madre de Dios region of Peru

    These files contain data from samples of crops, fish muscle, chicken meat, chicken eggs, and chicken feathers from areas both heavily and minimally impacted by artisanal and small-scale gold mining in the Madre de Dios region of the Peruvian Amazon. The data were used to determine how geogenic and anthropogenic Hg accumulates in terrestrial and aquatic organisms serving as local food staples. Samples were collected during June through August in 2018 and 2019. These data are predominantly comprised of total Hg measurements for crops, chicken, and fish plus methyl Hg for crops and chicken. Stable carbon and nitrogen isotope data are presented for chicken feathers and fish tissue. Survey data from individuals who contributed chicken samples to the study are included. Coordinates for each community from which samples were obtained can be found in the .csv files.

    Description of the da...

  14. n

    Data from: Multiple facets of biodiversity are threatened by mining-induced...

    • data.niaid.nih.gov
    • dataone.org
    • +2more
    zip
    Updated Jul 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Lloyd; Ubirajara Oliveira (2023). Multiple facets of biodiversity are threatened by mining-induced land-use change in the Brazilian Amazon [Dataset]. http://doi.org/10.5061/dryad.s1rn8pkcm
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 6, 2023
    Dataset provided by
    Universidade Federal de Minas Gerais
    The University of Queensland
    Authors
    Thomas Lloyd; Ubirajara Oliveira
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    Amazon Rainforest
    Description

    Aim Mining is increasingly pressuring areas of critical importance for biodiversity conservation, such as the Brazilian Amazon. Biodiversity data are limited in the tropics, restricting the scope for risks to be appropriately estimated before mineral licencing decisions are made. As the distributions and range sizes of other taxa differ markedly from those of vertebrates – the common proxy for analysis of risk to biodiversity from mining – whether mining threatens lesser-studied taxonomic groups differentially at a regional scale is unclear. Location Brazilian Amazon Methods We assess risks to several facets of biodiversity from industrial mining by comparing mining areas (within 70km of an active mining lease) and areas unaffected by mining, employing species richness, species endemism, phylogenetic diversity, and phylogenetic endemism metrics calculated for angiosperms, arthropods, and vertebrates. Results Mining areas contained higher densities of species occurrence records than the unaffected landscape, and we accounted for this sampling bias in our analyses. None of the four biodiversity metrics differed between mining and non-mining areas for vertebrates. For arthropods, species endemism was greater in mined areas. Mined areas also had greater angiosperm species richness, phylogenetic diversity, and phylogenetic endemism, although lower species endemism than unmined areas. Main Conclusions Unlike for vertebrates, facets of angiosperm and arthropod diversity are relatively higher in areas of mining activity, underscoring the need to consider multiple taxonomic groups and biodiversity facets when assessing risk and evaluating management options for mining threats. Particularly concerning is the proximity of mining to areas supporting deep evolutionary history, which may be impossible to recover or replace. As pressures to expand mining in the Amazon grow, impact assessments with broader taxonomic reach and metric focus will be vital to conserving biodiversity in mining regions. Methods Database Assembly Mapping Mining Areas We obtained spatial information on mineral prospecting and mineral mining leases within the Brazilian Amazon from SIGMINE (Sistema de Informações Geográficas da Mineração; DNPM, 2012). This database catalogues all registered legal mining activities within Brazil, detailing the extent of each activity, dates of operation, and mined commodities. To map ‘mining leases’ of industrial-scale mineral mines, we selected records greater than 100 hectares in area and classified as mining concessions (Concessão de Lavra) and omitted leases extracting water or those classified as small-scale artisanal operations (Lavra Garimpeira). This resulted in 411 polygons (including active leases and adjacent extensions of such leases) of 15,750 km2 in total area, with mining start dates ranging from 1944 to 2017 (mean = 1978, sd = 11.9; Fig. 1). To map ‘mining areas,’ which include the direct (i.e., immediate land-use change resulting from mineral extraction) and indirect (i.e., extensive land-use change associated with mineral extraction, processing, and transportation) impacts of mining on forests (Sonter et al., 2017), we created a 70 km buffer surrounding each mining lease. ‘Non-mining areas’ (i.e., areas unaffected by industrial mining) were mapped by extracting our mapped ‘mining areas’ and an additional layer representing all other legal mining leases excluded from our analyses (i.e., inactive leases, those targeting water, or operations smaller than 100 hectares in area; shown in white in Fig. 1) from the Brazilian Amazon (Fig. 1). For interpolation analyses, hexagons are the most logical sampling unit shape as their centroids are equidistant, the distance of points from the edges to the centroid is the closest, and sampling biases are reduced due to their lower perimeter-area ratio compared to squares or triangles (Birch et al., 2007). Hexagons of approximately 0.5° with equal area were assigned to one of two study areas – mining areas or non-mining areas – based on where their centroid was located (Fig. 1). Hexagons were omitted from our analyses if they contained fewer than 20 occurrence records per taxonomic group or their centroid was located outside the Brazilian Amazon. We used 0.5° hexagon sampling units as sensitivity analyses conducted in previous studies utilising the same dataset indicated reduced variation in results for hexagon areas of 0.5° and above (Oliveira et al., 2017a; Strand et al., 2018) and so any fine-scale georeferencing inaccuracies remaining in the dataset after filtering are minimised (Oliveira et al., 2017b). This sampling unit area also ensured sufficient sample sizes would be assigned within and among mining-induced deforestation-affected areas to enable robust comparisons across the study area for all taxonomic groups, particularly arthropods, while reducing the amount of area hexagon interpolations may sample from outside their respective study area polygons. Assembling Biodiversity Data Data on species occurrences were obtained from (Oliveira et al., 2017a) and (Oliveira et al., 2019a) and represent the most comprehensive dataset of species occurrences in Brazil to date. These data were assembled from online databases spanning GBIF (gbif.org); CRIA (specieslink.net); Birdlife International (birdlife.org), Herpnet (herpnet.org), Nature Serve (natureserve.org); and Orthoptera Species File (orthoptera.speciesfile.org). These data were also supplemented with occurrence records obtained from taxonomic literature and biodiversity inventories (Oliveira et al., 2017a; Oliveira et al., 2019a). All species occurrence records were filtered to determine if they lacked geographic coordinates or exhibited location errors using a map of Brazilian municipalities (mapas.ibge.gov.br; Oliveira et al., 2017a; Oliveira et al., 2019a). Taxonomic validity for all occurrence records was confirmed using taxon-specific catalogues and expert reviews for each taxonomic group (Oliveira et al., 2017a; Oliveira et al., 2019a). After filtering for geographic and taxonomic accuracy, the final dataset comprised 113,790 occurrence records for all the Brazilian Amazon. The dataset contained 44,660 records of angiosperms (6899 species of families Asteraceae, Bromeliaceae, Fabaceae, Melastomataceae, Myrtaceae, Orchidaceae, Poaceae, and Rubiaceae), 24,374 records of arthropods (4630 species of bees, spiders, millipedes, Orthoptera, dragonflies, moths and Diptera), and 44,756 records of vertebrates (1584 species of birds, mammals, and anurans). Spatial distributions of occurrence record densities for each taxonomic group are provided in the supporting information (Fig. S1). Phylogenetic trees were constructed from published figures into Newick code with TreeSnatcherPlus (Laubach & Von Haeseler, 2007) and supplemented with data from empirical phylogenetic studies synthesised by The Open Tree of Life project (Hinchliff et al., 2015). As branch lengths, when available, are not directly comparable between trees, all branch lengths were considered equal to one (Oliveira et al., 2017a; Oliveira et al., 2019a). Phylogenetic trees were compiled into a supertree using matrix representation with parsimony (Baum, 1992) and pruned to represent species restricted to Brazil. Our dataset represents the most extensive collection of species occurrence records and phylogenetic trees compiled in Brazil for this purpose to date (Oliveira et al., 2017a). However, data collected for environmental impact assessments that are not published online will inevitably be missing from our database, and rare, threatened, or range-restricted organisms may also not be included due to limited sampling. Calculation of Biodiversity Facets Sampling Effort We first intersected mining lease and mining area polygons with species occurrence records to provide a coarse estimate of the proportion of occurrence records within mining leases and their more expansive impact areas from the total contained in our database. An equal area measure was calculated through the ‘Sampling Effort’ functor of the BioDinamica plug-in (Oliveira et al., 2019b) of Dinamica EGO (Ferreira et al., 2019), which was set with a 10 km search radius due to limited and sporadic biodiversity sampling in the Brazilian Amazon (Oliveira et al., 2016; Oliveira et al., 2017a). We then converted the output raster to points and summed the mean sample effort index values across 0.5° radius hexagon sampling units (Fig. 1). The ‘Sampling Effort’ functor in BioDinamica employs a Gaussian kernel density index function. For all analyses using BioDinamica, 0.5° hexagon sampling units were only created where ≥ 20 species occurrence records existed. Biodiversity Metrics We calculated four sampling-effort-corrected biodiversity metrics for each of the three taxonomic groups: species richness, species endemism, phylogenetic diversity, and phylogenetic endemism, since measuring biodiversity with species richness alone does not capture values pertinent to conservation at the landscape scale, such as endemism or evolutionary history (Faith, 1992; Faith et al., 2004). Indeed, the loss of species is not equivalent to the loss of evolutionary history (Vane-Wright et al., 1991), and conservation priority areas can differ when using species richness and phylogenetic diversity (Rodrigues et al., 2005; Forest et al., 2007). Furthermore, phylogenetic measures may capture the quantity and distribution of diversity better than species-based measures, especially when data are limited, but both are representative of different diversity components (Rosauer & Mooers, 2013; Tucker et al., 2017). Thus, here we employ a variety of biodiversity metrics for comparison between mining and non-mining areas in the Brazilian Amazon. Species-based Metrics Species richness (per unit area) is the most sensitive biodiversity measure to variation in sampling effort (Oliveira

  15. B

    Big Data Intelligence Engine Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Big Data Intelligence Engine Report [Dataset]. https://www.datainsightsmarket.com/reports/big-data-intelligence-engine-1991939
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    May 21, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Big Data Intelligence Engine market is experiencing robust growth, driven by the increasing need for advanced analytics across diverse sectors. The market's expansion is fueled by several key factors: the exponential growth of data volume from various sources (IoT devices, social media, etc.), the rising adoption of cloud computing for data storage and processing, and the increasing demand for real-time insights to support faster and more informed decision-making. Applications spanning data mining, machine learning, and artificial intelligence are significantly contributing to this market expansion. Furthermore, the rising adoption of programming languages like Java, Python, and Scala, which are well-suited for big data processing, is further fueling market growth. Technological advancements, such as the development of more efficient and scalable algorithms and the emergence of specialized hardware like GPUs, are also playing a crucial role. While data security and privacy concerns, along with the high initial investment costs associated with implementing Big Data Intelligence Engine solutions, pose some restraints, the overall market outlook remains extremely positive. The competitive landscape is dominated by a mix of established technology giants like IBM, Microsoft, Google, and Amazon, and emerging players such as Alibaba Cloud, Tencent Cloud, and Baidu Cloud. These companies are aggressively investing in research and development to enhance their offerings and expand their market share. The market is geographically diverse, with North America and Europe currently holding significant market shares. However, the Asia-Pacific region, particularly China and India, is expected to witness the fastest growth in the coming years due to increasing digitalization and government initiatives promoting technological advancements. This growth is further segmented by application (Data Mining, Machine Learning, AI) and programming languages (Java, Python, Scala), offering opportunities for specialized solutions and services. The forecast period of 2025-2033 promises substantial growth, driven by continued innovation and widespread adoption across industries.

  16. U

    Unsupervised Learning Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Unsupervised Learning Report [Dataset]. https://www.datainsightsmarket.com/reports/unsupervised-learning-1944939
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming unsupervised learning market! This analysis reveals key trends, drivers, and restraints shaping this $15 billion (2025 est.) industry, featuring major players like Microsoft, Google, and IBM. Learn about projected growth and regional market share forecasts through 2033.

  17. Performance and time of the random forest classifier on the Amazon dataset.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad (2024). Performance and time of the random forest classifier on the Amazon dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0294968.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance and time of the random forest classifier on the Amazon dataset.

  18. Literature survey of sentiment analysis.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad (2024). Literature survey of sentiment analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0294968.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Noura A. Semary; Wesam Ahmed; Khalid Amin; Paweł Pławiak; Mohamed Hammad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A crucial part of sentiment classification is featuring extraction because it involves extracting valuable information from text data, which affects the model’s performance. The goal of this paper is to help in selecting a suitable feature extraction method to enhance the performance of sentiment analysis tasks. In order to provide directions for future machine learning and feature extraction research, it is important to analyze and summarize feature extraction techniques methodically from a machine learning standpoint. There are several methods under consideration, including Bag-of-words (BOW), Word2Vector, N-gram, Term Frequency- Inverse Document Frequency (TF-IDF), Hashing Vectorizer (HV), and Global vector for word representation (GloVe). To prove the ability of each feature extractor, we applied it to the Twitter US airlines and Amazon musical instrument reviews datasets. Finally, we trained a random forest classifier using 70% of the training data and 30% of the testing data, enabling us to evaluate and compare the performance using different metrics. Based on our results, we find that the TD-IDF technique demonstrates superior performance, with an accuracy of 99% in the Amazon reviews dataset and 96% in the Twitter US airlines dataset. This study underscores the paramount significance of feature extraction in sentiment analysis, endowing pragmatic insights to elevate model performance and steer future research pursuits.

  19. U

    Unsupervised Learning Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Mar 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Unsupervised Learning Report [Dataset]. https://www.archivemarketresearch.com/reports/unsupervised-learning-56632
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 13, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Discover the booming unsupervised learning market! Projected at $15 billion in 2025 and growing at a 25% CAGR, this report analyzes market drivers, trends, and key players like Microsoft & Google. Explore regional breakdowns and future forecasts (2025-2033).

  20. d

    Data from: Limited biomass recovery from gold mining in Amazonian forests

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michelle Kalamandeen; Emanuel Gloor; Isaac Johnson; Shenelle Agard; Martin Katow; Ashmore Vanbrooke; David Ashley; Sarah A. Batterman; Guy Ziv; Kaslyn Collins-Holder; Oliver L. Phillips; Eduardo S. Brondizio; Ima Vieira; David Galbraith (2025). Limited biomass recovery from gold mining in Amazonian forests [Dataset]. http://doi.org/10.5061/dryad.j6q573n9s
    Explore at:
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Michelle Kalamandeen; Emanuel Gloor; Isaac Johnson; Shenelle Agard; Martin Katow; Ashmore Vanbrooke; David Ashley; Sarah A. Batterman; Guy Ziv; Kaslyn Collins-Holder; Oliver L. Phillips; Eduardo S. Brondizio; Ima Vieira; David Galbraith
    Time period covered
    Jan 1, 2020
    Description

    Gold mining has rapidly increased across the Amazon Basin in recent years, especially in the Guiana shield, where it is responsible for >90% of total deforestation. However, the ability of forests to recover from gold mining activities remains largely unquantified. Forest inventory plots were installed on recently abandoned mines in two major mining regions in Guyana, and re-censused 18 months later, to provide the first ground-based quantification of gold mining impacts on Amazon forest biomass recovery. We found that woody biomass recovery rates on abandoned mining pits and tailing ponds are amongst the lowest ever recorded for tropical forests, with close to no woody biomass recovery after 3-4 years. On the overburden sites (i.e. areas not mined but where excavated soil is deposited), however, aboveground biomass recovery rates (0.4 - 3.5 Mg ha-1 yr-1) were within the range of those recorded in other secondary forests across the Neotropics following abandonment of past...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Teng Chang (2024). Amazon dataset for ERS-REFMMF [Dataset]. http://doi.org/10.6084/m9.figshare.25126313.v1
Organization logoOrganization logo

Amazon dataset for ERS-REFMMF

Explore at:
txtAvailable download formats
Dataset updated
Feb 1, 2024
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Teng Chang
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Recommender systems based on matrix factorization act as black-box models and are unable to explain the recommended items. After adding the neighborhood algorithm, the explainability is measured by the user's neighborhood recommendation, but the subjective explicit preference of the target user is ignored. To better combine the latent factors from matrix factorization and the target user's explicit preferences, an explainable recommender system based on reconstructed explanatory factors and multi-modal matrix factorization (ERS-REFMMF) is proposed. ERS-REFMMF is a two-layer model, and the underlying model decomposes the multi-modal scoring matrix to get the rich latent features of the user and the item based on the method of Funk-SVD, in which the multi-modal scoring matrix consists of the original matrix and the preference features and sentiment scores exhibited by users in the reviews corresponding to the ratings. The set of candidate items is obtained based on the latent features, and the explainability is reconstructed based on the subjective preference of the target user and the real recognition level of the neighbors. The upper layer is the multi-objective high-performance recommendation stage, in which the candidate set is optimized by a multi-objective evolutionary algorithm to bring the user a final recommendation list that is accurate, recallable, diverse, and interpretable, in which the accuracy and recall are represented by F1-measure. Experimental results on three real datasets from Amazon show that the proposed model is competitive compared to existing recommendation methods in both stages.

Search
Clear search
Close search
Google apps
Main menu