57 datasets found

Amazon employees 2007-2024
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amazon employees 2007-2024 [Dataset]. https://www.statista.com/statistics/234488/number-of-amazon-employees/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, United States
Description
The combined number of full- and part-time employees of Amazon.com has increased significantly since 2017. Amazon’s headcount peaked in 2021 when the American multinational e-commerce company employed ********* full- and part-time employees, not counting external contractors. However, in 2024, the number dropped to *********. E-commerce crunch The workforce reduction of Amazon follows the mass layoffs hitting the entire e-commerce sector. With the full reopening of physical stores after the COVID-19 pandemic, online shopping demand decreased, leading online retailers to restructure their businesses, including personnel costs. Diversifying business With online retail sales growing slower due to recession and inflation, Amazon can still leverage other profitable revenue segments — from media subscriptions to server hosting and cloud services. On top of that, in 2023 Amazon monitored small enterprises operating in different fields and strategically invested in them, as disclosed startup acquisitions indicate.
Datasets for Sentiment Analysis
zenodo.org
csv
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias (2023). Datasets for Sentiment Analysis [Dataset]. http://doi.org/10.5281/zenodo.10157504
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10157504
Dataset updated
Dec 10, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Julie R. Repository creator - Campos Arias; Julie R. Repository creator - Campos Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.

----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
product_id - Product ID
product_name - Name of the Product
category - Category of the Product
discounted_price - Discounted Price of the Product
actual_price - Actual Price of the Product
discount_percentage - Percentage of Discount for the Product
rating - Rating of the Product
rating_count - Number of people who voted for the Amazon rating
about_product - Description about the Product
user_id - ID of the user who wrote review for the Product
user_name - Name of the user who wrote review for the Product
review_id - ID of the user review
review_title - Short review
review_content - Long review
img_link - Image Link of the Product
product_link - Official Website Link of the Product
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Amazon Reviews for Dog Food Product
kaggle.com
Updated Apr 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
unwrangle (2021). Amazon Reviews for Dog Food Product [Dataset]. https://www.kaggle.com/unwrangle/amazon-reviews-for-dog-food-product/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 27, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
unwrangle
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
File includes 4605 reviews for a high quality dog food product on Amazon. This dataset was generated using Unwrangle Review Extractor API.

This dataset can be used for the following applications and more:

** Analyzing trends**

Just as an example, you can see estimate how room occupancy must have been affected by the Covid 19 pandemic.

** Sentiment Analysis / Opinion Mining**

Using NLP techniques one can find out what the average user’s sentiment is towards each of the featured hotels in this dataset.

** Topic / Aspect Extraction**

Using categorization techniques one can quickly figure out how each of the hotels featured in this dataset fairs on attributes such as room quality, staff, food, check-in process, etc.

** Competitor Analysis**

If you would like to find out what customers think about your competitors, a tailored dataset like the one featured in this blog post can enable you to do so with simple data analysis or visualization techniques.
b
Amazon Statistics (2025)
businessofapps.com
Updated Jul 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Business of Apps (2025). Amazon Statistics (2025) [Dataset]. https://www.businessofapps.com/data/amazon-statistics/
Explore at:
Dataset updated
Jul 20, 2025
Dataset authored and provided by
Business of Apps
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Amazon is one of the most recognisable brands in the world, and the third largest by revenue. It was the fourth tech company to reach a $1 trillion market cap, and a market leader in e-commerce,...
Amazon Laptop Specs data
kaggle.com
Updated Mar 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aravind A S (2023). Amazon Laptop Specs data [Dataset]. https://www.kaggle.com/aravindas01/amazon-laptop-specs-data/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aravind A S
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
"Amazon Laptop Specs" is a comprehensive dataset containing detailed specifications of various laptop models sold on Amazon. The dataset consists of about 100 laptop models and covers a wide range of brands, including Dell, HP, Lenovo, Apple, Acer, Asus, and more.

The data includes various attributes of each laptop, such as the processor type, RAM size, hard disk size, screen size, graphics card, operating system, battery life, and more. Additionally, the dataset includes information on the price, customer reviews, and ratings for each laptop model.

The dataset is suitable for researchers, analysts, and data scientists who are interested in exploring the market trends, comparing the performance of different laptop models, or building predictive models to understand customer behavior.

This dataset can also be used by e-commerce businesses to analyze customer preferences and identify the most popular laptop models, which can help in making informed decisions about inventory management, pricing, and marketing strategies
f
Crowdsourcing image analysis for plant phenomics to generate ground truth...
plos.figshare.com
pdf
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Naihui Zhou; Zachary D. Siegel; Scott Zarecor; Nigel Lee; Darwin A. Campbell; Carson M. Andorf; Dan Nettleton; Carolyn J. Lawrence-Dill; Baskar Ganapathysubramanian; Jonathan W. Kelly; Iddo Friedberg (2023). Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning [Dataset]. http://doi.org/10.1371/journal.pcbi.1006337
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1006337
Dataset updated
Jun 4, 2023
Dataset provided by
PLOS Computational Biology
Authors
Naihui Zhou; Zachary D. Siegel; Scott Zarecor; Nigel Lee; Darwin A. Campbell; Carson M. Andorf; Dan Nettleton; Carolyn J. Lawrence-Dill; Baskar Ganapathysubramanian; Jonathan W. Kelly; Iddo Friedberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.
A
‘FAANG- Complete Stock Data’ analyzed by Analyst-2
analyst-2.ai
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘FAANG- Complete Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-faang-complete-stock-data-36c1/9110ef3b/?iid=011-763&v=presentation
Explore at:
Dataset updated
Sep 30, 2021
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 30 September 2021.

--- Dataset description provided by original source is as follows ---

Context

There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.

Content

The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.

--- Original source retains full ownership of the source dataset ---
Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10)
zenodo.org
data.niaid.nih.gov
Updated Apr 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Prettenhofer; Benno Stein; Benno Stein; Peter Prettenhofer (2023). Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10) [Dataset]. http://doi.org/10.5281/zenodo.3251672
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.3251672
Dataset updated
Apr 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter Prettenhofer; Benno Stein; Benno Stein; Peter Prettenhofer
Description
The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.

For more information on the construction of the dataset see (Prettenhofer and Stein, 2010) or the enclosed readme files. If you have a question after reading the paper and the readme files, please contact Peter Prettenhofer.

We provide the dataset in two formats: 1) a processed format which corresponds to the preprocessing (tokenization, etc.) in (Prettenhofer and Stein, 2010); 2) an unprocessed format which contains the full text of the reviews (e.g., for machine translation or feature engineering).

The dataset was first used by (Prettenhofer and Stein, 2010). It consists of Amazon product reviews for three product categories---books, dvds and music---written in four different languages: English, German, French, and Japanese. The German, French, and Japanese reviews were crawled from Amazon in November, 2009. The English reviews were sampled from the Multi-Domain Sentiment Dataset (Blitzer et. al., 2007). For each language-category pair there exist three sets of training documents, test documents, and unlabeled documents. The training and test sets comprise 2.000 documents each, whereas the number of unlabeled documents varies from 9.000 - 170.000.
Z
Data from: SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural...
data.niaid.nih.gov
zenodo.org
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nikitaras, Karolos (2025). SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7119399
Explore at:
Dataset updated
Mar 7, 2025
Dataset provided by
Nikitaras, Karolos
Ellinas, Nikolaos
Tsiakoulis, Pirros
Maniati, Georgia
Vioni, Alexandra
Chalamandaris, Aimilios
Jho, Gunu
Klapsas, Konstantinos
Sung, June Sig
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the public release of the Samsung Open Mean Opinion Scores (SOMOS) dataset for the evaluation of neural text-to-speech (TTS) synthesis, which consists of audio files generated with a public domain voice from trained TTS models based on bibliography, and numbers assigned to each audio as quality (naturalness) evaluations by several crowdsourced listeners.DescriptionThe SOMOS dataset contains 20,000 synthetic utterances (wavs), 100 natural utterances and 374,955 naturalness evaluations (human-assigned scores in the range 1-5). The synthetic utterances are single-speaker, generated by training several Tacotron-like acoustic models and an LPCNet vocoder on the LJ Speech voice public dataset. 2,000 text sentences were synthesized, selected from Blizzard Challenge texts of years 2007-2016, the LJ Speech corpus as well as Wikipedia and general domain data from the Internet.Naturalness evaluations were collected via crowdsourcing a listening test on Amazon Mechanical Turk in the US, GB and CA locales. The records of listening test participants (workers) are fully anonymized. Statistics on the reliability of the scores assigned by the workers are also included, generated through processing the scores and validation controls per submission page.

To listen to audio samples of the dataset, please see our Github page.

The dataset release comes with a carefully designed train-validation-test split (70%-15%-15%) with unseen systems, listeners and texts, which can be used for experimentation on MOS prediction.

This version also contains the necessary resources to obtain the transcripts corresponding to all dataset audios.

Terms of use

The dataset may be used for research purposes only, for non-commercial purposes only, and may be distributed with the same terms.

Every time you produce research that has used this dataset, please cite the dataset appropriately.

Cite as:

@inproceedings{maniati22_interspeech, author={Georgia Maniati and Alexandra Vioni and Nikolaos Ellinas and Karolos Nikitaras and Konstantinos Klapsas and June Sig Sung and Gunu Jho and Aimilios Chalamandaris and Pirros Tsiakoulis}, title={{SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={2388--2392}, doi={10.21437/Interspeech.2022-10922} }

References of resources & models used

Voice & synthesized texts:K. Ito and L. Johnson, “The LJ Speech Dataset,” https://keithito.com/LJ-Speech-Dataset/, 2017.

Vocoder:J.-M. Valin and J. Skoglund, “LPCNet: Improving neural speech synthesis through linear prediction,” in Proc. ICASSP, 2019.R. Vipperla, S. Park, K. Choo, S. Ishtiaq, K. Min, S. Bhattacharya, A. Mehrotra, A. G. C. P. Ramos, and N. D. Lane, “Bunched lpcnet: Vocoder for low-cost neural text-to-speech systems,” in Proc. Interspeech, 2020.

Acoustic models:N. Ellinas, G. Vamvoukakis, K. Markopoulos, A. Chalamandaris, G. Maniati, P. Kakoulidis, S. Raptis, J. S. Sung, H. Park, and P. Tsiakoulis, “High quality streaming speech synthesis with low, sentence-length-independent latency,” in Proc. Interspeech, 2020.Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio et al., “Tacotron: Towards End-to-End Speech Synthesis,” in Proc. Interspeech, 2017.J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan et al., “Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions,” in Proc. ICASSP, 2018.J. Shen, Y. Jia, M. Chrzanowski, Y. Zhang, I. Elias, H. Zen, and Y. Wu, “Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling,” arXiv preprint arXiv:2010.04301, 2020.M. Honnibal and M. Johnson, “An Improved Non-monotonic Transition System for Dependency Parsing,” in Proc. EMNLP, 2015.M. Dominguez, P. L. Rohrer, and J. Soler-Company, “PyToBI: A Toolkit for ToBI Labeling Under Python,” in Proc. Interspeech, 2019.Y. Zou, S. Liu, X. Yin, H. Lin, C. Wang, H. Zhang, and Z. Ma, “Fine-grained prosody modeling in neural speech synthesis using ToBI representation,” in Proc. Interspeech, 2021.K. Klapsas, N. Ellinas, J. S. Sung, H. Park, and S. Raptis, “WordLevel Style Control for Expressive, Non-attentive Speech Synthesis,” in Proc. SPECOM, 2021.T. Raitio, R. Rasipuram, and D. Castellani, “Controllable neural text-to-speech synthesis using intuitive prosodic features,” in Proc. Interspeech, 2020.

Synthesized texts from the Blizzard Challenges 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2016:M. Fraser and S. King, "The Blizzard Challenge 2007," in Proc. SSW6, 2007.V. Karaiskos, S. King, R. A. Clark, and C. Mayo, "The Blizzard Challenge 2008," in Proc. Blizzard Challenge Workshop, 2008.A. W. Black, S. King, and K. Tokuda, "The Blizzard Challenge 2009," in Proc. Blizzard Challenge, 2009.S. King and V. Karaiskos, "The Blizzard Challenge 2010," 2010.S. King and V. Karaiskos, "The Blizzard Challenge 2011," 2011.S. King and V. Karaiskos, "The Blizzard Challenge 2012," 2012.S. King and V. Karaiskos, "The Blizzard Challenge 2013," 2013.S. King and V. Karaiskos, "The Blizzard Challenge 2016," 2016.

Contact

Alexandra Vioni - a.vioni@samsung.com

If you have any questions or comments about the dataset, please feel free to write to us.

We are interested in knowing if you find our dataset useful! If you use our dataset, please email us and tell us about your research.
C
Childhood Asthma Healthcare Utilization
data.wprdc.org
data.amerigeoss.org
csv
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allegheny County (2024). Childhood Asthma Healthcare Utilization [Dataset]. https://data.wprdc.org/dataset/childhood-asthma-healthcare-utilization
Explore at:
csv(10404)Available download formats
Dataset updated
Jun 3, 2024
Dataset provided by
Allegheny County
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data shows healthcare utilization for asthma by Allegheny County residents 18 years of age and younger. It counts asthma-related visits to the Emergency Department (ED), hospitalizations, urgent care visits, and asthma controller medication dispensing events.

The asthma data was compiled as part of the Allegheny County Health Department’s Asthma Task Force, which was established in 2018. The Task Force was formed to identify strategies to decrease asthma inpatient and emergency utilization among children (ages 0-18), with special focus on children receiving services funded by Medicaid. Data is being used to improve the understanding of asthma in Allegheny County, and inform the recommended actions of the task force. Data will also be used to evaluate progress toward the goal of reducing asthma-related hospitalization and ED visits.

Regarding this data, asthma is defined using the International Classification of Diseases, Tenth Revision (IDC-10) classification system code J45.xxx. The ICD-10 system is used to classify diagnoses, symptoms, and procedures in the U.S. healthcare system.

Children seeking care for an asthma-related claim in 2017 are represented in the data. Data is compiled by the Health Department from medical claims submitted to three health plans (UPMC, Gateway Health, and Highmark). Claims may also come from people enrolled in Medicaid plans managed by these insurers. The Health Department estimates that 74% of the County’s population aged 0-18 is represented in the data.

Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time. Missing from the data are the uninsured, members in participating plans enrolled for less than 90 continuous days in 2017, children with an asthma-related condition that did not file a claim in 2017, and children participating in plans managed by insurers that did not share data with the Health Department.

Data users should also be aware that diagnoses may also be subject to misclassification, and that children with an asthmatic condition may not be diagnosed. It is also possible that some children may be counted more than once in the data if they are enrolled in a plan by more than one participating insurer and file a claim on each policy in the same calendar year.

Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Crowdsourced datasets to study the generation and impact of text...
figshare.com
bin
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge Ramírez; Marcos Baez; Fabio Casati; Boualem Benatallah (2023). Crowdsourced datasets to study the generation and impact of text highlighting in classification tasks [Dataset]. http://doi.org/10.6084/m9.figshare.9917162.v4
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9917162.v4
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jorge Ramírez; Marcos Baez; Fabio Casati; Boualem Benatallah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here we present the datasets derived from our experiments on using crowdsourcing for document classification tasks. These experiments resemble a two-step process that first highlights excerpts from the text and then leverage these to workers for classification. Thus our experiments groups into highlighting generation and classification. For generating highlights, we leverage crowdsourcing and automatic approaches such us extractive summarization and question answering models. For our classification experiments, we consider documents from two different domains: systematic literature reviews and amazon product reviews. Specifically, we study how highlighting text passages could aid workers in judging the relevance of a document given an input question. We spec these datasets to benefit not only to study these particular problem domains but a broader set of classification problems where individual judgments from workers are scarce.In a nutshell, the datasets represent two kinds of tasks:- classification tasks with highlighting support.- highlighting tasks, where the workers highlight evidence.Classification tasksIn this task, workers classified documents based on a given predicate. classification tasks using crowdsourced highlightsFiles:- classification_amazon-crowd-highlights.csv- classification_oa-crowd-highlights.csv- classification_tech-crowd-highlights.csv- classification_tech-3x12-crowd-highlights.csv- classification_tech-6x6-crowd-highlights.csvclassification tasks using ML-generated highlightsFiles:- classification_amazon-ML-highlights.csv- classification_oa-ML-highlights.csv- classification_tech-ML-highlights.csvHighlighting taskscrowdsourced highlightsIn this task, workers highlighted excerpts from documents that are relevant to a given predicate, to support future classification tasks.File: crowdsourced_highlights.csv.The file contains one line per highlight (generated by one worker); the column that holds the highlighted fragment(s) is highlighted_text. The highlighted_text is a "list of lists" (Python syntax), so iterating over this list will give you the text fragment generated by one worker. Also, the experiment column indicates domain + task design. So, to get the highlights used in the classification experiments, use the rows that end with "-highlight".ML-generated highlightsWe also consider automatic approaches to generate text highlights — specifically, extractive summarization and question-answering models.File: ml_highlights.csv.
B
Data from: The Impact of Financial Strain on Work-Family Conflict During...
borealisdata.ca
Updated Jul 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christine Tulk (2025). The Impact of Financial Strain on Work-Family Conflict During COVID-19 [Dataset]. http://doi.org/10.5683/SP3/JZVHUK
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/JZVHUK
Dataset updated
Jul 26, 2025
Dataset provided by
Borealis
Authors
Christine Tulk
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
SAS data file used to produce results for article published in the Journal of Business and Psychology examining the impact of financial strain on work-family conflict during COVID-19. https://doi.org/10.1007/s10869-025-10063-2 Data were collected at seven time points between April 2020 and October 2020 using a longitudinal panel design. Participants were workers registered with Amazon's Mechanical Turk who were located in the United States Abstract: The announcements of pandemic lockdown measures across North America in mid-March 2020 marked the start of a chaotic period with extensive changes at work and at home. Because families experiencing financial strain had fewer resources to help manage work and family demands, the present study examined how financial strain at the within- and between-person levels influenced work interference with family (WIF) and family interference with work (FIW) and whether those experiences were moderated by childcare and eldercare responsibilities. Using a longitudinal panel design, 538 workers recruited through Amazon’s Mechanical Turk responded to seven surveys between April and October 2020 asking about financial strain, WIF, and FIW. Multilevel modeling showed that an individual's average financial strain over the seven-month period was associated with higher WIF and FIW; however, a higher-than-usual level of financial strain was associated only with higher FIW. Interactions of financial strain with childcare and eldercare were not significant. At the between-person level, financial strain was an important contributor to WIF and FIW, even after accounting for childcare and eldercare. Consistent with conservation of resources theory, these findings suggest that financial strain represents a perceived threat that actively draws on limited personal resources, thereby reducing capacity to manage work-family conflict. This underscores the need for greater support for families experiencing financial strain. In addition to fair pay and benefits, organizations could consider novel approaches to reducing financial strain amongst employees such as financial counselling and emergency income replacement funds. This readme file was generated on 2025-07-25 by Christine Tulk -------------------- GENERAL INFORMATION -------------------- 1. Title of Dataset: The Impact of Financial Strain on Work-Family Conflict During COVID-19 2. Author Information Name: Christine Tulk ORCID: 0000-0001-7312-7406 Institution: Carleton University Address: Ottawa, Canada Email: christine.tulk@carleton.ca 3. Date of data collection: 538 responses at Time 1 (April 17-18), 320 responses at Time 2 (May 4-11), 263 responses at Time 3 (June 6-13), 250 responses at Time 4 (July 9-16), 225 responses at Time 5 (Aug 17-24), 203 responses at Time 6 (Sept 26 - Oct 3), and 181 responses at Time 7 (Oct 28 - Nov 4) 4. Geographic location of data collection: MTurk workers located in the United States 5. Dataset Description: The data are formatted in long format suitable for multilevel modeling with one row per time point per participant. ----------------------------------- SHARING/ACCESS INFORMATION ----------------------------------- Links to publications that cite or use the data: https://doi.org/10.1007/s10869-025-10063-2 ------------------------- DATA & FILE OVERVIEW ------------------------- 1. File List: financial_strain.sas7bdat - SAS data file financial_strain.csv - Text file 3. Additional related data collected that was not included in the current data package: Additional variables were collected and are available upon reasonable request from the author. --------------------------- METHODOLOGICAL INFORMATION --------------------------- 1. Description of methods used for collection/generation of data: Collected by surveys administered at seven times between April 2020 and October 2020 2. Methods for processing the data: Data were initially downloaded from the Qualtrics web site in Excel format and imported into SAS. ------------------------------------------- DATA-SPECIFIC INFORMATION FOR: financial_strain.csv/sas7bdat ------------------------------------------- 1. Number of variables: 2. Number of cases/rows: 1977 rows 3. Variable List: id (Level 2): participant id Gender (Level 2): man = 0, woman = 1 BC_Emotion (Level 2): Measure of emotion-focused coping BC_Problem (Level 2): Measure of problem-focusing coping BC_Support (Level 2): Measure of support-focused coping FinM (Level 2): Person-averaged financial strain FinCM (Level 2): Person-averaged financial strain centered around group mean Child (Level 2): 0 = No childcare responsibilities 1 = Childcare responsibilities Elder (Level 2): 0 = No eldercare responsibilities 1 = Eldercare responsibilities Partner (Level 2): 0 = Not partnered (e.g., single, divorced) 1 = Partnered (e.g., married) wfcM (Level 2): Person-averaged work-to-family conflict fwcM (Level 2): Person-averaged family-to-work conflict HoursM (Level 2): Person-averaged average work hours per...
DASCH DR7 Digital Inventory
zenodo.org
zip
Updated Dec 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter K. G. Williams; Peter K. G. Williams (2024). DASCH DR7 Digital Inventory [Dataset]. http://doi.org/10.5281/zenodo.14563521
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14563521
Dataset updated
Dec 27, 2024
Dataset provided by
Harvard College Observatoryhttp://www.cfa.harvard.edu/hco
Authors
Peter K. G. Williams; Peter K. G. Williams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These files define a "digital inventory" of all of the files archived as part of DASCH Data Release 7 (DR7). DASCH (Digital Access to a Sky Century @ Harvard) was the project to digitize the Harvard College Observatory’s Astronomical Photographic Glass Plate Collection for scientific applications. This irreplaceable resource provides a means for systematic study of the sky on 100-year time scales.

This inventory does not contain the actual DASCH data. Rather, it contains an exhaustive index of all of the DASCH data — virtually all aspects of DASCH's digital existence throughout the project's entire history, up through the DR7 release date (December, 2024). The complete inventory documents 33,791,530 files totaling 745,627,062,858,355 bytes (around 678 TiB) of data. The inventory itself is about 10 GiB in size (decompressed), spread across 3,946 files.

The actual underlying data are currently archived in a set of Amazon AWS S3 buckets and magnetic tapes held by Harvard College Observatory. Most DASCH users are encouraged to access DASCH data via the project's data access services; this inventory should only be of interest to those interested in large-scale duplication of the DASCH data.

The DASCH archive, which is indexed by this inventory, includes:

Full-plate "mosaic" FITS images of more than 428,000 plates, as well as photographs of the plates and their jackets

Astrometric solution data for about 97% of the plates

Photometric calibration data for about 89% of the plates

Lightcurves for all sources extracted from the plates, matched to two separate reference catalogs:

23,574,404,199 measurements calibrated to the APASS DR8 catalog

27,966,413,880 measurements calibrated to the ATLAS-refcat2 catalog

About 166,000 photographs of observing logbooks documenting the plates, and a selection of historical astronomer notebooks discussing them

Derived products, generated from the above, needed to operate the DASCH data access services

Raw "tile" data from two decades of DASCH scanning, as well as supporting calibration and telemetry files

All of the source code behind the DASCH software systems, from scanning to pipeline processing to data access services to end-user analysis

Logs relating to all modern DASCH pipeline processing, data management, and other operations tasks

All available project documentation

All other data files supporting DASCH operations

See the README.md file within the collection for more information about the structure and contents of this inventory. In summary, it organizes the DASCH data files into a virtual hierarchy of names. Associated with each name is a size (in bytes), MD5 digest, and one or more "data URLs" recording locations where that file is archived as of DR7. Every single file has a data URL indicating a location on Amazon's AWS S3 storage service; many files also have one or more copies on magnetic backup tapes held
by Harvard College Observatory.

The inventory is expressed as a collection of plain-text (UTF-8) files using Markdown syntax. There is approximately one such file for each "folder" or "subtree" of the virtual name hierarchy. Each file contains a human-readable preamble describing the folder contents, an optional Markdown table listing any direct-descendant subfolders, and an optional Markdown table documenting any files contained directly within that folder. The intention is that it should be fairly straightforward for both humans to navigate these files, as well as to write software that processes them. While most files are human-scale in size, the largest (Inventory.pipeline_astrometry.md) is about 280 MiB and contains about 1.5 million records.

As of the DR7 release, only some DASCH archive files are directly accessible by third parties. The Starglass website (https://starglass.cfa.harvard.edu/) makes many photographs and "mosaics" (full-plate FITS images) available, and the web APIs supporting this site and the DASCH data access services (see the DASCH site, https://dasch.cfa.harvard.edu/) provide access to additional resources. To duplicate other portions of the archive, you may need to contact Harvard College Observatory. It is hoped that over time, more and more of the DASCH archive will become available for direct download. It is also hoped that additional copies of the DASCH archive will be created and publicized; the best way to ensure the long-term preservation of this dataset is to duplicate it. A major goal of this inventory is to make such duplication tractable.

To the greatest extent possible, it is believed that all of the files documented as part of this archive can be duplicated free of legal encumbrances. Unless documented otherwise, the copyright owner of all copyrightable elements is the President and Fellows of Harvard College. Please see the DASCH website for the most up-to-date guidance regarding image credits and any legal topics relating to this dataset.

Acknowledgments

The DASCH scanning project was the work of literally hundreds of people over multiple decades. Out of the many people who have devoted their time and energy to the project, the essential contributions of a few deserve special recognition: Prof. Jonathan (Josh) Grindlay; Bob Simcoe; Edward Los; Lindsay Smith Zrull; and Alison Doane.

The DASCH project at Harvard is grateful for partial support from NSF grants AST-0407380, AST-0909073, and AST-1313370; which should be acknowledged in all papers making use of DASCH data.

We acknowledge the one-time gift of the Cornel and Cynthia K. Sarosdy Fund for DASCH, and thank Grzegorz Pojmanski of the ASAS project for providing some of the source code on which the DASCH scientific data access portal was based.

The ongoing AAVSO Photometric All-Sky Survey (APASS) has improved DASCH photometric calibration and is funded by the Robert Martin Ayers Sciences Fund.

This inventory and DASCH Data Release 7 were prepared by Peter K. G. Williams in December, 2024.
e
The effects of facial attractiveness and trustworthiness in online...
b2find.eudat.eu
Updated Oct 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). The effects of facial attractiveness and trustworthiness in online peer-to-peer markets - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/87dc7d7b-e36b-59bb-9277-99d429ebb075
Explore at:
Dataset updated
Oct 31, 2023
Description
This data package includes the data, analysis scripts, and relevant documents for the project: The effects of facial attractiveness and trustworthiness in online peer-to-peer markets. Method: All data was collected using Amazon Mechanical Turk workers who filled in a survey design in Qualtrics survey software. Universe: All data was collected from Amazon Mechanical Turk workers who were U.S. citizens.
f
Lessons Learned from Crowdsourcing Complex Engineering Tasks
plos.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Staffelbach; Peter Sempolinski; Tracy Kijewski-Correa; Douglas Thain; Daniel Wei; Ahsan Kareem; Gregory Madey (2023). Lessons Learned from Crowdsourcing Complex Engineering Tasks [Dataset]. http://doi.org/10.1371/journal.pone.0134978
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0134978
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Matthew Staffelbach; Peter Sempolinski; Tracy Kijewski-Correa; Douglas Thain; Daniel Wei; Ahsan Kareem; Gregory Madey
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CrowdsourcingCrowdsourcing is the practice of obtaining needed ideas, services, or content by requesting contributions from a large group of people. Amazon Mechanical Turk is a web marketplace for crowdsourcing microtasks, such as answering surveys and image tagging. We explored the limits of crowdsourcing by using Mechanical Turk for a more complicated task: analysis and creation of wind simulations.Harnessing Crowdworkers for EngineeringOur investigation examined the feasibility of using crowdsourcing for complex, highly technical tasks. This was done to determine if the benefits of crowdsourcing could be harnessed to accurately and effectively contribute to solving complex real world engineering problems. Of course, untrained crowds cannot be used as a mere substitute for trained expertise. Rather, we sought to understand how crowd workers can be used as a large pool of labor for a preliminary analysis of complex data.Virtual Wind TunnelWe compared the skill of the anonymous crowd workers from Amazon Mechanical Turk with that of civil engineering graduate students, making a first pass at analyzing wind simulation data. For the first phase, we posted analysis questions to Amazon crowd workers and to two groups of civil engineering graduate students. A second phase of our experiment instructed crowd workers and students to create simulations on our Virtual Wind Tunnel website to solve a more complex task.ConclusionsWith a sufficiently comprehensive tutorial and compensation similar to typical crowd-sourcing wages, we were able to enlist crowd workers to effectively complete longer, more complex tasks with competence comparable to that of graduate students with more comprehensive, expert-level knowledge. Furthermore, more complex tasks require increased communication with the workers. As tasks become more complex, the employment relationship begins to become more akin to outsourcing than crowdsourcing. Through this investigation, we were able to stretch and explore the limits of crowdsourcing as a tool for solving complex problems.
Amazon revenue 2004-2024
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amazon revenue 2004-2024 [Dataset]. https://www.statista.com/statistics/266282/annual-net-revenue-of-amazoncom/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide, United States
Description
From 2004 to 2024, the net revenue of Amazon e-commerce and service sales has increased tremendously. In the fiscal year ending December 31, the multinational e-commerce company's net revenue was almost *** billion U.S. dollars, up from *** billion U.S. dollars in 2023.Amazon.com, a U.S. e-commerce company originally founded in 1994, is the world’s largest online retailer of books, clothing, electronics, music, and many more goods. As of 2024, the company generates the majority of it's net revenues through online retail product sales, followed by third-party retail seller services, cloud computing services, and retail subscription services including Amazon Prime. From seller to digital environment Through Amazon, consumers are able to purchase goods at a rather discounted price from both small and large companies as well as from other users. Both new and used goods are sold on the website. Due to the wide variety of goods available at prices which often undercut local brick-and-mortar retail offerings, Amazon has dominated the retailer market. As of 2024, Amazon’s brand worth amounts to over *** billion U.S. dollars, topping the likes of companies such as Walmart, Ikea, as well as digital competitors Alibaba and eBay. One of Amazon's first forays into the world of hardware was its e-reader Kindle, one of the most popular e-book readers worldwide. More recently, Amazon has also released several series of own-branded products and a voice-controlled virtual assistant, Alexa. Headquartered in North America Due to its location, Amazon offers more services in North America than worldwide. As a result, the majority of the company’s net revenue in 2023 was actually earned in the United States, Canada, and Mexico. In 2023, approximately *** billion U.S. dollars was earned in North America compared to only roughly *** billion U.S. dollars internationally.
US Cost of Living Dataset (1877 Counties)
kaggle.com
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2024). US Cost of Living Dataset (1877 Counties) [Dataset]. http://doi.org/10.34740/kaggle/ds/3832881
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3832881
Dataset updated
Feb 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
asaniczka
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
The US Family Budget Dataset provides insights into the cost of living in different US counties based on the Family Budget Calculator by the Economic Policy Institute (EPI).

This dataset offers community-specific estimates for ten family types, including one or two adults with zero to four children, in all 1877 counties and metro areas across the United States.

Interesting Task Ideas:

See how family budgets compare to the federal poverty line and the Supplemental Poverty Measure in different counties.

Look into the money challenges faced by different types of families using the budgets provided.

Find out which counties have the most affordable places to live, food, transportation, healthcare, childcare, and other things people need.

Explore how the average income of families relates to the overall cost of living in different counties.

Investigate how family size affects the estimated budget and find counties where bigger families have higher costs.

Create visuals showing how the cost of living varies across different states and big cities.

Check whether specific counties are affordable for families of different sizes and types.

Use the dataset to compare living standards and economic security in different US counties.

If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Checkout my other datasets

Employment-to-Population Ratio for USA

Productivity and Hourly Compensation

130K Kindle Books

900K TMDb Movies

USA Unemployment Rates by Demographics & Race

Photo by Alev Takil on Unsplash
f
Text Simplification: Impact of Noun Phrase Length on Sentence Difficulty
arizona.figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gondy Augusta Leroy (2023). Text Simplification: Impact of Noun Phrase Length on Sentence Difficulty [Dataset]. http://doi.org/10.25422/azu.data.21357498.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25422/azu.data.21357498.v1
Dataset updated
May 30, 2023
Dataset provided by
University of Arizona Research Data Repository
Authors
Gondy Augusta Leroy
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
As described in Section 2 of the associated publication, this dataset contains 360 sentences from 12 experimental conditions: NP length (2-gram, 3-gram, 4-gram), NP split (Yes/No), and pseudoword use (Yes/No). Perceived difficulty (Likert Scale) and actual difficulty (multiple choice content questions) for each sentence are provided as an average. The average is based on approximately 35 evaluations per sentence by Amazon Mechanical Turk workers.

For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
e
Experimental data for: Memory retrieval processes help explain the...
b2find.eudat.eu
Updated May 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Experimental data for: Memory retrieval processes help explain the incumbency advantage. Judgement and Decision Making - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/3f93c011-6ad7-5a31-b5b0-592bc28fa838
Explore at:
Dataset updated
May 2, 2023
Description
This data package includes the data and materials for the three experiments conducted on the project: Memory Retrieval Processes Help Explain the Incumbency Advantage. The research measures and manipulates participant sequential memory retrieval patterns while considering the choice between two political candidates. We find that the order in which participants retrieve information about the candidate from memory is related to a preference for the candidate already in office (incumbent). DSA proof. - Method: All data was collected using Amazon Mechanical Turk workers who filled in a survey design in Qualtrics survey software. - All data was collected from an Amazon Mechanical Turk workers who were U.S. citizens.
Data from: Dataset of gold-mining related deforestation and formalization in...
agdatacommons.nal.usda.gov
bin
Updated Jan 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nora L. Álvarez-Berríos; Jessica L’Roe (2025). Dataset of gold-mining related deforestation and formalization in Madre de Dios, Perú from 2001 to 2014 [Dataset]. http://doi.org/10.2737/RDS-2021-0083
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.2737/RDS-2021-0083
Dataset updated
Jan 22, 2025
Dataset provided by
U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
Authors
Nora L. Álvarez-Berríos; Jessica L’Roe
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Madre de Dios, Peru
Description
A global surge in ‘artisanal’, small-scale mining (ASM) threatens biodiverse tropical forests and exposes residents to dangerous levels of mercury. In response, governments, and development agencies are investing millions (USD) on ASM formalization; registering concessions and demarcating extraction zones to promote regulatory adherence and direct mining away from ecologically sensitive areas. This data publication contains data used to examine patterns of mining-related deforestation associated with ASM formalization efforts in the Department of Madre de Dios in the Peruvian Amazon. Using satellite images and government-issued spatial layers on mining formalization, we tracked changes in mining activities from 2001 to 2014 when agencies: (a) issued 1701 provisional titles and (b) tried to restrict mining to a > 5000 square kilometer (km²) ‘corridor’. The data reported in this publication are based on the centroids of a 25 hectare (ha) hexagon grid covering the 20,850 km² study area and includes variables related (1) mining deforestation from years 2001 to 2014, (2) mining concession status, (3) location relative to the mining corridor, as well as (4) location relative to time-invariant variables and access (geology, distance to river), administrative units (district, native communities), and conservation designation (protected areas).Data were compiled and analyzed to examine patterns of mining-related deforestation associated with formalization efforts in the Department of Madre de Dios, Perú.For more information about this study and these data, see Álvarez-Berríos and L'Roe (2021).

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Amazon employees 2007-2024 [Dataset]. https://www.statista.com/statistics/234488/number-of-amazon-employees/

Amazon employees 2007-2024

Explore at:

42 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 25, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Area covered

Worldwide, United States

Description

The combined number of full- and part-time employees of Amazon.com has increased significantly since 2017. Amazon’s headcount peaked in 2021 when the American multinational e-commerce company employed ********* full- and part-time employees, not counting external contractors. However, in 2024, the number dropped to *********. E-commerce crunch The workforce reduction of Amazon follows the mass layoffs hitting the entire e-commerce sector. With the full reopening of physical stores after the COVID-19 pandemic, online shopping demand decreased, leading online retailers to restructure their businesses, including personnel costs. Diversifying business With online retail sales growing slower due to recession and inflation, Amazon can still leverage other profitable revenue segments — from media subscriptions to server hosting and cloud services. On top of that, in 2023 Amazon monitored small enterprises operating in different fields and strategically invested in them, as disclosed startup acquisitions indicate.

Clear search

Close search

Google apps

Main menu

Amazon employees 2007-2024

Datasets for Sentiment Analysis

Amazon Reviews for Dog Food Product

Amazon Statistics (2025)

Amazon Laptop Specs data

Crowdsourcing image analysis for plant phenomics to generate ground truth...

‘FAANG- Complete Stock Data’ analyzed by Analyst-2

Context

Content

Webis Cross-Lingual Sentiment Dataset 2010 (Webis-CLS-10)

Data from: SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural...

Childhood Asthma Healthcare Utilization

Crowdsourced datasets to study the generation and impact of text...

Data from: The Impact of Financial Strain on Work-Family Conflict During...

DASCH DR7 Digital Inventory

Acknowledgments

The effects of facial attractiveness and trustworthiness in online...

Lessons Learned from Crowdsourcing Complex Engineering Tasks

Amazon revenue 2004-2024

US Cost of Living Dataset (1877 Counties)

Interesting Task Ideas:

Checkout my other datasets

Text Simplification: Impact of Noun Phrase Length on Sentence Difficulty

Experimental data for: Memory retrieval processes help explain the...

Data from: Dataset of gold-mining related deforestation and formalization in...

Amazon employees 2007-2024