The combined number of full- and part-time employees of Amazon.com has increased significantly since 2017. Amazon’s headcount peaked in 2021 when the American multinational e-commerce company employed ********* full- and part-time employees, not counting external contractors. However, in 2024, the number dropped to *********. E-commerce crunch The workforce reduction of Amazon follows the mass layoffs hitting the entire e-commerce sector. With the full reopening of physical stores after the COVID-19 pandemic, online shopping demand decreased, leading online retailers to restructure their businesses, including personnel costs. Diversifying business With online retail sales growing slower due to recession and inflation, Amazon can still leverage other profitable revenue segments — from media subscriptions to server hosting and cloud services. On top of that, in 2023 Amazon monitored small enterprises operating in different fields and strategically invested in them, as disclosed startup acquisitions indicate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
File includes 4605 reviews for a high quality dog food product on Amazon. This dataset was generated using Unwrangle Review Extractor API.
This dataset can be used for the following applications and more:
** Analyzing trends**
Just as an example, you can see estimate how room occupancy must have been affected by the Covid 19 pandemic.
** Sentiment Analysis / Opinion Mining**
Using NLP techniques one can find out what the average user’s sentiment is towards each of the featured hotels in this dataset.
** Topic / Aspect Extraction**
Using categorization techniques one can quickly figure out how each of the hotels featured in this dataset fairs on attributes such as room quality, staff, food, check-in process, etc.
** Competitor Analysis**
If you would like to find out what customers think about your competitors, a tailored dataset like the one featured in this blog post can enable you to do so with simple data analysis or visualization techniques.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Amazon is one of the most recognisable brands in the world, and the third largest by revenue. It was the fourth tech company to reach a $1 trillion market cap, and a market leader in e-commerce,...
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
"Amazon Laptop Specs" is a comprehensive dataset containing detailed specifications of various laptop models sold on Amazon. The dataset consists of about 100 laptop models and covers a wide range of brands, including Dell, HP, Lenovo, Apple, Acer, Asus, and more.
The data includes various attributes of each laptop, such as the processor type, RAM size, hard disk size, screen size, graphics card, operating system, battery life, and more. Additionally, the dataset includes information on the price, customer reviews, and ratings for each laptop model.
The dataset is suitable for researchers, analysts, and data scientists who are interested in exploring the market trends, comparing the performance of different laptop models, or building predictive models to understand customer behavior.
This dataset can also be used by e-commerce businesses to analyze customer preferences and identify the most popular laptop models, which can help in making informed decisions about inventory management, pricing, and marketing strategies
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The accuracy of machine learning tasks critically depends on high quality ground truth data. Therefore, in many cases, producing good ground truth data typically involves trained professionals; however, this can be costly in time, effort, and money. Here we explore the use of crowdsourcing to generate a large number of training data of good quality. We explore an image analysis task involving the segmentation of corn tassels from images taken in a field setting. We investigate the accuracy, speed and other quality metrics when this task is performed by students for academic credit, Amazon MTurk workers, and Master Amazon MTurk workers. We conclude that the Amazon MTurk and Master Mturk workers perform significantly better than the for-credit students, but with no significant difference between the two MTurk worker types. Furthermore, the quality of the segmentation produced by Amazon MTurk workers rivals that of an expert worker. We provide best practices to assess the quality of ground truth data, and to compare data quality produced by different sources. We conclude that properly managed crowdsourcing can be used to establish large volumes of viable ground truth data at a low cost and high quality, especially in the context of high throughput plant phenotyping. We also provide several metrics for assessing the quality of the generated datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘FAANG- Complete Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/aayushmishra1512/faang-complete-stock-data on 30 September 2021.
--- Dataset description provided by original source is as follows ---
There are a few companies that are considered to be revolutionary. These companies also happen to be a dream place to work at for many many people across the world. These companies include - Facebook,Amazon,Apple,Netflix and Google also known as FAANG! These companies make ton of money and they help others too by giving them a chance to invest in the companies via stocks and shares. This data wass made targeting these stock prices.
The data contains information such as opening price of a stock, closing price, how much of these stocks were sold and many more things. There are 5 different CSV files in the data for each company.
--- Original source retains full ownership of the source dataset ---
The Cross-Lingual Sentiment (CLS) dataset comprises about 800.000 Amazon product reviews in the four languages English, German, French, and Japanese.
For more information on the construction of the dataset see (Prettenhofer and Stein, 2010) or the enclosed readme files. If you have a question after reading the paper and the readme files, please contact Peter Prettenhofer.
We provide the dataset in two formats: 1) a processed format which corresponds to the preprocessing (tokenization, etc.) in (Prettenhofer and Stein, 2010); 2) an unprocessed format which contains the full text of the reviews (e.g., for machine translation or feature engineering).
The dataset was first used by (Prettenhofer and Stein, 2010). It consists of Amazon product reviews for three product categories---books, dvds and music---written in four different languages: English, German, French, and Japanese. The German, French, and Japanese reviews were crawled from Amazon in November, 2009. The English reviews were sampled from the Multi-Domain Sentiment Dataset (Blitzer et. al., 2007). For each language-category pair there exist three sets of training documents, test documents, and unlabeled documents. The training and test sets comprise 2.000 documents each, whereas the number of unlabeled documents varies from 9.000 - 170.000.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is the public release of the Samsung Open Mean Opinion Scores (SOMOS) dataset for the evaluation of neural text-to-speech (TTS) synthesis, which consists of audio files generated with a public domain voice from trained TTS models based on bibliography, and numbers assigned to each audio as quality (naturalness) evaluations by several crowdsourced listeners.DescriptionThe SOMOS dataset contains 20,000 synthetic utterances (wavs), 100 natural utterances and 374,955 naturalness evaluations (human-assigned scores in the range 1-5). The synthetic utterances are single-speaker, generated by training several Tacotron-like acoustic models and an LPCNet vocoder on the LJ Speech voice public dataset. 2,000 text sentences were synthesized, selected from Blizzard Challenge texts of years 2007-2016, the LJ Speech corpus as well as Wikipedia and general domain data from the Internet.Naturalness evaluations were collected via crowdsourcing a listening test on Amazon Mechanical Turk in the US, GB and CA locales. The records of listening test participants (workers) are fully anonymized. Statistics on the reliability of the scores assigned by the workers are also included, generated through processing the scores and validation controls per submission page.
To listen to audio samples of the dataset, please see our Github page.
The dataset release comes with a carefully designed train-validation-test split (70%-15%-15%) with unseen systems, listeners and texts, which can be used for experimentation on MOS prediction.
This version also contains the necessary resources to obtain the transcripts corresponding to all dataset audios.
Terms of use
The dataset may be used for research purposes only, for non-commercial purposes only, and may be distributed with the same terms.
Every time you produce research that has used this dataset, please cite the dataset appropriately.
Cite as:
@inproceedings{maniati22_interspeech, author={Georgia Maniati and Alexandra Vioni and Nikolaos Ellinas and Karolos Nikitaras and Konstantinos Klapsas and June Sig Sung and Gunu Jho and Aimilios Chalamandaris and Pirros Tsiakoulis}, title={{SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis}}, year=2022, booktitle={Proc. Interspeech 2022}, pages={2388--2392}, doi={10.21437/Interspeech.2022-10922} }
References of resources & models used
Voice & synthesized texts:K. Ito and L. Johnson, “The LJ Speech Dataset,” https://keithito.com/LJ-Speech-Dataset/, 2017.
Vocoder:J.-M. Valin and J. Skoglund, “LPCNet: Improving neural speech synthesis through linear prediction,” in Proc. ICASSP, 2019.R. Vipperla, S. Park, K. Choo, S. Ishtiaq, K. Min, S. Bhattacharya, A. Mehrotra, A. G. C. P. Ramos, and N. D. Lane, “Bunched lpcnet: Vocoder for low-cost neural text-to-speech systems,” in Proc. Interspeech, 2020.
Acoustic models:N. Ellinas, G. Vamvoukakis, K. Markopoulos, A. Chalamandaris, G. Maniati, P. Kakoulidis, S. Raptis, J. S. Sung, H. Park, and P. Tsiakoulis, “High quality streaming speech synthesis with low, sentence-length-independent latency,” in Proc. Interspeech, 2020.Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio et al., “Tacotron: Towards End-to-End Speech Synthesis,” in Proc. Interspeech, 2017.J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan et al., “Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions,” in Proc. ICASSP, 2018.J. Shen, Y. Jia, M. Chrzanowski, Y. Zhang, I. Elias, H. Zen, and Y. Wu, “Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling,” arXiv preprint arXiv:2010.04301, 2020.M. Honnibal and M. Johnson, “An Improved Non-monotonic Transition System for Dependency Parsing,” in Proc. EMNLP, 2015.M. Dominguez, P. L. Rohrer, and J. Soler-Company, “PyToBI: A Toolkit for ToBI Labeling Under Python,” in Proc. Interspeech, 2019.Y. Zou, S. Liu, X. Yin, H. Lin, C. Wang, H. Zhang, and Z. Ma, “Fine-grained prosody modeling in neural speech synthesis using ToBI representation,” in Proc. Interspeech, 2021.K. Klapsas, N. Ellinas, J. S. Sung, H. Park, and S. Raptis, “WordLevel Style Control for Expressive, Non-attentive Speech Synthesis,” in Proc. SPECOM, 2021.T. Raitio, R. Rasipuram, and D. Castellani, “Controllable neural text-to-speech synthesis using intuitive prosodic features,” in Proc. Interspeech, 2020.
Synthesized texts from the Blizzard Challenges 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2016:M. Fraser and S. King, "The Blizzard Challenge 2007," in Proc. SSW6, 2007.V. Karaiskos, S. King, R. A. Clark, and C. Mayo, "The Blizzard Challenge 2008," in Proc. Blizzard Challenge Workshop, 2008.A. W. Black, S. King, and K. Tokuda, "The Blizzard Challenge 2009," in Proc. Blizzard Challenge, 2009.S. King and V. Karaiskos, "The Blizzard Challenge 2010," 2010.S. King and V. Karaiskos, "The Blizzard Challenge 2011," 2011.S. King and V. Karaiskos, "The Blizzard Challenge 2012," 2012.S. King and V. Karaiskos, "The Blizzard Challenge 2013," 2013.S. King and V. Karaiskos, "The Blizzard Challenge 2016," 2016.
Contact
Alexandra Vioni - a.vioni@samsung.com
If you have any questions or comments about the dataset, please feel free to write to us.
We are interested in knowing if you find our dataset useful! If you use our dataset, please email us and tell us about your research.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This data shows healthcare utilization for asthma by Allegheny County residents 18 years of age and younger. It counts asthma-related visits to the Emergency Department (ED), hospitalizations, urgent care visits, and asthma controller medication dispensing events.
The asthma data was compiled as part of the Allegheny County Health Department’s Asthma Task Force, which was established in 2018. The Task Force was formed to identify strategies to decrease asthma inpatient and emergency utilization among children (ages 0-18), with special focus on children receiving services funded by Medicaid. Data is being used to improve the understanding of asthma in Allegheny County, and inform the recommended actions of the task force. Data will also be used to evaluate progress toward the goal of reducing asthma-related hospitalization and ED visits.
Regarding this data, asthma is defined using the International Classification of Diseases, Tenth Revision (IDC-10) classification system code J45.xxx. The ICD-10 system is used to classify diagnoses, symptoms, and procedures in the U.S. healthcare system.
Children seeking care for an asthma-related claim in 2017 are represented in the data. Data is compiled by the Health Department from medical claims submitted to three health plans (UPMC, Gateway Health, and Highmark). Claims may also come from people enrolled in Medicaid plans managed by these insurers. The Health Department estimates that 74% of the County’s population aged 0-18 is represented in the data.
Users should be cautious of using administrative claims data as a measure of disease prevalence and interpreting trends over time. Missing from the data are the uninsured, members in participating plans enrolled for less than 90 continuous days in 2017, children with an asthma-related condition that did not file a claim in 2017, and children participating in plans managed by insurers that did not share data with the Health Department.
Data users should also be aware that diagnoses may also be subject to misclassification, and that children with an asthmatic condition may not be diagnosed. It is also possible that some children may be counted more than once in the data if they are enrolled in a plan by more than one participating insurer and file a claim on each policy in the same calendar year.
Support for Health Equity datasets and tools provided by Amazon Web Services (AWS) through their Health Equity Initiative.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we present the datasets derived from our experiments on using crowdsourcing for document classification tasks. These experiments resemble a two-step process that first highlights excerpts from the text and then leverage these to workers for classification. Thus our experiments groups into highlighting generation and classification. For generating highlights, we leverage crowdsourcing and automatic approaches such us extractive summarization and question answering models. For our classification experiments, we consider documents from two different domains: systematic literature reviews and amazon product reviews. Specifically, we study how highlighting text passages could aid workers in judging the relevance of a document given an input question. We spec these datasets to benefit not only to study these particular problem domains but a broader set of classification problems where individual judgments from workers are scarce.In a nutshell, the datasets represent two kinds of tasks:- classification tasks with highlighting support.- highlighting tasks, where the workers highlight evidence.Classification tasksIn this task, workers classified documents based on a given predicate. classification tasks using crowdsourced highlightsFiles:- classification_amazon-crowd-highlights.csv- classification_oa-crowd-highlights.csv- classification_tech-crowd-highlights.csv- classification_tech-3x12-crowd-highlights.csv- classification_tech-6x6-crowd-highlights.csvclassification tasks using ML-generated highlightsFiles:- classification_amazon-ML-highlights.csv- classification_oa-ML-highlights.csv- classification_tech-ML-highlights.csvHighlighting taskscrowdsourced highlightsIn this task, workers highlighted excerpts from documents that are relevant to a given predicate, to support future classification tasks.File: crowdsourced_highlights.csv.The file contains one line per highlight (generated by one worker); the column that holds the highlighted fragment(s) is highlighted_text. The highlighted_text is a "list of lists" (Python syntax), so iterating over this list will give you the text fragment generated by one worker. Also, the experiment column indicates domain + task design. So, to get the highlights used in the classification experiments, use the rows that end with "-highlight".ML-generated highlightsWe also consider automatic approaches to generate text highlights — specifically, extractive summarization and question-answering models.File: ml_highlights.csv.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
SAS data file used to produce results for article published in the Journal of Business and Psychology examining the impact of financial strain on work-family conflict during COVID-19. https://doi.org/10.1007/s10869-025-10063-2 Data were collected at seven time points between April 2020 and October 2020 using a longitudinal panel design. Participants were workers registered with Amazon's Mechanical Turk who were located in the United States Abstract: The announcements of pandemic lockdown measures across North America in mid-March 2020 marked the start of a chaotic period with extensive changes at work and at home. Because families experiencing financial strain had fewer resources to help manage work and family demands, the present study examined how financial strain at the within- and between-person levels influenced work interference with family (WIF) and family interference with work (FIW) and whether those experiences were moderated by childcare and eldercare responsibilities. Using a longitudinal panel design, 538 workers recruited through Amazon’s Mechanical Turk responded to seven surveys between April and October 2020 asking about financial strain, WIF, and FIW. Multilevel modeling showed that an individual's average financial strain over the seven-month period was associated with higher WIF and FIW; however, a higher-than-usual level of financial strain was associated only with higher FIW. Interactions of financial strain with childcare and eldercare were not significant. At the between-person level, financial strain was an important contributor to WIF and FIW, even after accounting for childcare and eldercare. Consistent with conservation of resources theory, these findings suggest that financial strain represents a perceived threat that actively draws on limited personal resources, thereby reducing capacity to manage work-family conflict. This underscores the need for greater support for families experiencing financial strain. In addition to fair pay and benefits, organizations could consider novel approaches to reducing financial strain amongst employees such as financial counselling and emergency income replacement funds. This readme file was generated on 2025-07-25 by Christine Tulk -------------------- GENERAL INFORMATION -------------------- 1. Title of Dataset: The Impact of Financial Strain on Work-Family Conflict During COVID-19 2. Author Information Name: Christine Tulk ORCID: 0000-0001-7312-7406 Institution: Carleton University Address: Ottawa, Canada Email: christine.tulk@carleton.ca 3. Date of data collection: 538 responses at Time 1 (April 17-18), 320 responses at Time 2 (May 4-11), 263 responses at Time 3 (June 6-13), 250 responses at Time 4 (July 9-16), 225 responses at Time 5 (Aug 17-24), 203 responses at Time 6 (Sept 26 - Oct 3), and 181 responses at Time 7 (Oct 28 - Nov 4) 4. Geographic location of data collection: MTurk workers located in the United States 5. Dataset Description: The data are formatted in long format suitable for multilevel modeling with one row per time point per participant. ----------------------------------- SHARING/ACCESS INFORMATION ----------------------------------- Links to publications that cite or use the data: https://doi.org/10.1007/s10869-025-10063-2 ------------------------- DATA & FILE OVERVIEW ------------------------- 1. File List: financial_strain.sas7bdat - SAS data file financial_strain.csv - Text file 3. Additional related data collected that was not included in the current data package: Additional variables were collected and are available upon reasonable request from the author. --------------------------- METHODOLOGICAL INFORMATION --------------------------- 1. Description of methods used for collection/generation of data: Collected by surveys administered at seven times between April 2020 and October 2020 2. Methods for processing the data: Data were initially downloaded from the Qualtrics web site in Excel format and imported into SAS. ------------------------------------------- DATA-SPECIFIC INFORMATION FOR: financial_strain.csv/sas7bdat ------------------------------------------- 1. Number of variables: 2. Number of cases/rows: 1977 rows 3. Variable List: id (Level 2): participant id Gender (Level 2): man = 0, woman = 1 BC_Emotion (Level 2): Measure of emotion-focused coping BC_Problem (Level 2): Measure of problem-focusing coping BC_Support (Level 2): Measure of support-focused coping FinM (Level 2): Person-averaged financial strain FinCM (Level 2): Person-averaged financial strain centered around group mean Child (Level 2): 0 = No childcare responsibilities 1 = Childcare responsibilities Elder (Level 2): 0 = No eldercare responsibilities 1 = Eldercare responsibilities Partner (Level 2): 0 = Not partnered (e.g., single, divorced) 1 = Partnered (e.g., married) wfcM (Level 2): Person-averaged work-to-family conflict fwcM (Level 2): Person-averaged family-to-work conflict HoursM (Level 2): Person-averaged average work hours per...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These files define a "digital inventory" of all of the files archived as part of DASCH Data Release 7 (DR7). DASCH (Digital Access to a Sky Century @ Harvard) was the project to digitize the Harvard College Observatory’s Astronomical Photographic Glass Plate Collection for scientific applications. This irreplaceable resource provides a means for systematic study of the sky on 100-year time scales.
This inventory does not contain the actual DASCH data. Rather, it contains an exhaustive index of all of the DASCH data — virtually all aspects of DASCH's digital existence throughout the project's entire history, up through the DR7 release date (December, 2024). The complete inventory documents 33,791,530 files totaling 745,627,062,858,355 bytes (around 678 TiB) of data. The inventory itself is about 10 GiB in size (decompressed), spread across 3,946 files.
The actual underlying data are currently archived in a set of Amazon AWS S3 buckets and magnetic tapes held by Harvard College Observatory. Most DASCH users are encouraged to access DASCH data via the project's data access services; this inventory should only be of interest to those interested in large-scale duplication of the DASCH data.
The DASCH archive, which is indexed by this inventory, includes:
See the README.md
file within the collection for more information about the structure and contents of this inventory. In summary, it organizes the DASCH data files into a virtual hierarchy of names. Associated with each name is a size (in bytes), MD5 digest, and one or more "data URLs" recording locations where that file is archived as of DR7. Every single file has a data URL indicating a location on Amazon's AWS S3 storage service; many files also have one or more copies on magnetic backup tapes held
by Harvard College Observatory.
The inventory is expressed as a collection of plain-text (UTF-8) files using Markdown syntax. There is approximately one such file for each "folder" or "subtree" of the virtual name hierarchy. Each file contains a human-readable preamble describing the folder contents, an optional Markdown table listing any direct-descendant subfolders, and an optional Markdown table documenting any files contained directly within that folder. The intention is that it should be fairly straightforward for both humans to navigate these files, as well as to write software that processes them. While most files are human-scale in size, the largest (Inventory.pipeline_astrometry.md
) is about 280 MiB and contains about 1.5 million records.
As of the DR7 release, only some DASCH archive files are directly accessible by third parties. The Starglass website (https://starglass.cfa.harvard.edu/) makes many photographs and "mosaics" (full-plate FITS images) available, and the web APIs supporting this site and the DASCH data access services (see the DASCH site, https://dasch.cfa.harvard.edu/) provide access to additional resources. To duplicate other portions of the archive, you may need to contact Harvard College Observatory. It is hoped that over time, more and more of the DASCH archive will become available for direct download. It is also hoped that additional copies of the DASCH archive will be created and publicized; the best way to ensure the long-term preservation of this dataset is to duplicate it. A major goal of this inventory is to make such duplication tractable.
To the greatest extent possible, it is believed that all of the files documented as part of this archive can be duplicated free of legal encumbrances. Unless documented otherwise, the copyright owner of all copyrightable elements is the President and Fellows of Harvard College. Please see the DASCH website for the most up-to-date guidance regarding image credits and any legal topics relating to this dataset.
The DASCH scanning project was the work of literally hundreds of people over multiple decades. Out of the many people who have devoted their time and energy to the project, the essential contributions of a few deserve special recognition: Prof. Jonathan (Josh) Grindlay; Bob Simcoe; Edward Los; Lindsay Smith Zrull; and Alison Doane.
The DASCH project at Harvard is grateful for partial support from NSF grants AST-0407380, AST-0909073, and AST-1313370; which should be acknowledged in all papers making use of DASCH data.
We acknowledge the one-time gift of the Cornel and Cynthia K. Sarosdy Fund for DASCH, and thank Grzegorz Pojmanski of the ASAS project for providing some of the source code on which the DASCH scientific data access portal was based.
The ongoing AAVSO Photometric All-Sky Survey (APASS) has improved DASCH photometric calibration and is funded by the Robert Martin Ayers Sciences Fund.
This inventory and DASCH Data Release 7 were prepared by Peter K. G. Williams in December, 2024.
This data package includes the data, analysis scripts, and relevant documents for the project: The effects of facial attractiveness and trustworthiness in online peer-to-peer markets. Method: All data was collected using Amazon Mechanical Turk workers who filled in a survey design in Qualtrics survey software. Universe: All data was collected from Amazon Mechanical Turk workers who were U.S. citizens.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CrowdsourcingCrowdsourcing is the practice of obtaining needed ideas, services, or content by requesting contributions from a large group of people. Amazon Mechanical Turk is a web marketplace for crowdsourcing microtasks, such as answering surveys and image tagging. We explored the limits of crowdsourcing by using Mechanical Turk for a more complicated task: analysis and creation of wind simulations.Harnessing Crowdworkers for EngineeringOur investigation examined the feasibility of using crowdsourcing for complex, highly technical tasks. This was done to determine if the benefits of crowdsourcing could be harnessed to accurately and effectively contribute to solving complex real world engineering problems. Of course, untrained crowds cannot be used as a mere substitute for trained expertise. Rather, we sought to understand how crowd workers can be used as a large pool of labor for a preliminary analysis of complex data.Virtual Wind TunnelWe compared the skill of the anonymous crowd workers from Amazon Mechanical Turk with that of civil engineering graduate students, making a first pass at analyzing wind simulation data. For the first phase, we posted analysis questions to Amazon crowd workers and to two groups of civil engineering graduate students. A second phase of our experiment instructed crowd workers and students to create simulations on our Virtual Wind Tunnel website to solve a more complex task.ConclusionsWith a sufficiently comprehensive tutorial and compensation similar to typical crowd-sourcing wages, we were able to enlist crowd workers to effectively complete longer, more complex tasks with competence comparable to that of graduate students with more comprehensive, expert-level knowledge. Furthermore, more complex tasks require increased communication with the workers. As tasks become more complex, the employment relationship begins to become more akin to outsourcing than crowdsourcing. Through this investigation, we were able to stretch and explore the limits of crowdsourcing as a tool for solving complex problems.
From 2004 to 2024, the net revenue of Amazon e-commerce and service sales has increased tremendously. In the fiscal year ending December 31, the multinational e-commerce company's net revenue was almost *** billion U.S. dollars, up from *** billion U.S. dollars in 2023.Amazon.com, a U.S. e-commerce company originally founded in 1994, is the world’s largest online retailer of books, clothing, electronics, music, and many more goods. As of 2024, the company generates the majority of it's net revenues through online retail product sales, followed by third-party retail seller services, cloud computing services, and retail subscription services including Amazon Prime. From seller to digital environment Through Amazon, consumers are able to purchase goods at a rather discounted price from both small and large companies as well as from other users. Both new and used goods are sold on the website. Due to the wide variety of goods available at prices which often undercut local brick-and-mortar retail offerings, Amazon has dominated the retailer market. As of 2024, Amazon’s brand worth amounts to over *** billion U.S. dollars, topping the likes of companies such as Walmart, Ikea, as well as digital competitors Alibaba and eBay. One of Amazon's first forays into the world of hardware was its e-reader Kindle, one of the most popular e-book readers worldwide. More recently, Amazon has also released several series of own-branded products and a voice-controlled virtual assistant, Alexa. Headquartered in North America Due to its location, Amazon offers more services in North America than worldwide. As a result, the majority of the company’s net revenue in 2023 was actually earned in the United States, Canada, and Mexico. In 2023, approximately *** billion U.S. dollars was earned in North America compared to only roughly *** billion U.S. dollars internationally.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The US Family Budget Dataset provides insights into the cost of living in different US counties based on the Family Budget Calculator by the Economic Policy Institute (EPI).
This dataset offers community-specific estimates for ten family types, including one or two adults with zero to four children, in all 1877 counties and metro areas across the United States.
If you find this dataset valuable, don't forget to hit the upvote button! 😊💝
Employment-to-Population Ratio for USA
Productivity and Hourly Compensation
USA Unemployment Rates by Demographics & Race
Photo by Alev Takil on Unsplash
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
As described in Section 2 of the associated publication, this dataset contains 360 sentences from 12 experimental conditions: NP length (2-gram, 3-gram, 4-gram), NP split (Yes/No), and pseudoword use (Yes/No). Perceived difficulty (Likert Scale) and actual difficulty (multiple choice content questions) for each sentence are provided as an average. The average is based on approximately 35 evaluations per sentence by Amazon Mechanical Turk workers.
For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
This data package includes the data and materials for the three experiments conducted on the project: Memory Retrieval Processes Help Explain the Incumbency Advantage. The research measures and manipulates participant sequential memory retrieval patterns while considering the choice between two political candidates. We find that the order in which participants retrieve information about the candidate from memory is related to a preference for the candidate already in office (incumbent). DSA proof. - Method: All data was collected using Amazon Mechanical Turk workers who filled in a survey design in Qualtrics survey software. - All data was collected from an Amazon Mechanical Turk workers who were U.S. citizens.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A global surge in ‘artisanal’, small-scale mining (ASM) threatens biodiverse tropical forests and exposes residents to dangerous levels of mercury. In response, governments, and development agencies are investing millions (USD) on ASM formalization; registering concessions and demarcating extraction zones to promote regulatory adherence and direct mining away from ecologically sensitive areas. This data publication contains data used to examine patterns of mining-related deforestation associated with ASM formalization efforts in the Department of Madre de Dios in the Peruvian Amazon. Using satellite images and government-issued spatial layers on mining formalization, we tracked changes in mining activities from 2001 to 2014 when agencies: (a) issued 1701 provisional titles and (b) tried to restrict mining to a > 5000 square kilometer (km²) ‘corridor’. The data reported in this publication are based on the centroids of a 25 hectare (ha) hexagon grid covering the 20,850 km² study area and includes variables related (1) mining deforestation from years 2001 to 2014, (2) mining concession status, (3) location relative to the mining corridor, as well as (4) location relative to time-invariant variables and access (geology, distance to river), administrative units (district, native communities), and conservation designation (protected areas).Data were compiled and analyzed to examine patterns of mining-related deforestation associated with formalization efforts in the Department of Madre de Dios, Perú.For more information about this study and these data, see Álvarez-Berríos and L'Roe (2021).
The combined number of full- and part-time employees of Amazon.com has increased significantly since 2017. Amazon’s headcount peaked in 2021 when the American multinational e-commerce company employed ********* full- and part-time employees, not counting external contractors. However, in 2024, the number dropped to *********. E-commerce crunch The workforce reduction of Amazon follows the mass layoffs hitting the entire e-commerce sector. With the full reopening of physical stores after the COVID-19 pandemic, online shopping demand decreased, leading online retailers to restructure their businesses, including personnel costs. Diversifying business With online retail sales growing slower due to recession and inflation, Amazon can still leverage other profitable revenue segments — from media subscriptions to server hosting and cloud services. On top of that, in 2023 Amazon monitored small enterprises operating in different fields and strategically invested in them, as disclosed startup acquisitions indicate.