Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Employee-Reviews Description Context Over 67k employee reviews for Google, Amazon, Facebook, Apple, and Microsoft
Content This dataset contains employee reviews separated into the following categories:
Index: index Company: Company name Location : This dataset is global, as such it may include the country's name in parenthesis [i.e "Toronto, ON(Canada)"]. However, if the location is in the USA then it will only include the city and state[i.e "Los Angeles, CA" ] Date Posted: in the following format MM DD, YYYY Job-Title: This string will also include whether the reviewer is a 'Current' or 'Former' Employee at the time of the review Summary: Short summary of employee review Pros: Pros Cons: Cons Overall Rating: 1-5 Work/Life Balance Rating: 1-5 Culture and Values Rating: 1-5 Career Opportunities Rating: 1-5 Comp & Benefits Rating: 1-5 Senior Management Rating: 1-5 Helpful Review Count: A count of how many people found the review to be helpful Link to Review : This will provide you with a direct link to the page that contains the review. However it is likely that this link will be outdated NOTE: 'none' is placed in all cells where no data value was found.
Acknowledgements This data was scraped from Glassdoor
3 Inspiration To inspire people to create ML models to search for meaningful trends within this dataset
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains data of different ethnicities and their representation in big tech companies like Google, Meta, Netflix etc.
Sources This dataset has been taken from Statista. Usually, there are tech diversity reports by different companies and they provide this data.
Inspiration The inspiration for this dataset was to explore the diversity amongst different ethnicities, BIPOC (Black Indigenous People Of Color), and their representation in various technical and non-technical roles. The intention is to collect different diversity datasets from big tech companies and explore the trends in their representation.
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
The Google Maps dataset is ideal for getting extensive information on businesses anywhere in the world. Easily filter by location, business type, and other factors to get the exact data you need. The Google Maps dataset includes all major data points: timestamp, name, category, address, description, open website, phone number, open_hours, open_hours_updated, reviews_count, rating, main_image, reviews, url, lat, lon, place_id, country, and more.
Facebook
TwitterThis dataset contains current and historical demographic data on Google's workforce since the company began publishing diversity data in 2014. It includes data collected for government reporting and voluntary employee self-identification globally relating to hiring, retention, and representation categorized by race, gender, sexual orientation, gender identity, disability status, and military status. In some instances, the data is limited due to various government policies around the world and the desire to protect Googler confidentiality. All data in this dataset will be updated yearly upon publication of Google’s Diversity Annual Report . Google uses this data to inform its diversity, equity, and inclusion work. More information on our methodology can be found in the Diversity Annual Report. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
Twitterhttps://brightdata.com/licensehttps://brightdata.com/license
This dataset encompasses a wide-ranging collection of Google Play applications, providing a holistic view of the diverse ecosystem within the platform. It includes information on various attributes such as the title, developer, monetization features, images, app descriptions, data safety measures, user ratings, number of reviews, star rating distributions, user feedback, recent updates, related applications by the same developer, content ratings, estimated downloads, and timestamps. By aggregating this data, the dataset offers researchers, developers, and analysts an extensive resource to explore and analyze trends, patterns, and dynamics within the Google Play Store. Researchers can utilize this dataset to conduct comprehensive studies on user behavior, market trends, and the impact of various factors on app success. Developers can leverage the insights derived from this dataset to inform their app development strategies, improve user engagement, and optimize monetization techniques. Analysts can employ the dataset to identify emerging trends, assess the performance of different categories of applications, and gain valuable insights into consumer preferences. Overall, this dataset serves as a valuable tool for understanding the broader landscape of the Google Play Store and unlocking actionable insights for various stakeholders in the mobile app industry.
Facebook
TwitterJob Postings Data for Talent Acquisition, HR Strategy & Market Research Canaria’s Job Postings Data product is a structured, AI-enriched dataset that captures and organizes millions of job listings from leading sources such as Indeed, LinkedIn, and other recruiting platforms. Designed for decision-makers in HR, strategy, and research, this data reveals workforce demand trends, employer activity, and hiring signals across the U.S. labor market and enhanced with advanced enrichment models.
The dataset enables clients to track who is hiring, what roles are being posted, which skills are in demand, where talent is needed geographically, and how compensation and employment structures evolve over time. With field-level normalization and deep enrichment, it transforms noisy job listings into high-resolution labor intelligence—optimized for strategic planning, analytics, and recruiting effectiveness.
Use Cases: What This Job Postings Data Solves This enriched dataset empowers users to analyze workforce activity, employer behavior, and hiring trends across sectors, geographies, and job categories.
Talent Acquisition & HR Strategy • Identify hiring trends by industry, company, function, and geography • Optimize job listings and outreach with enriched skill, title, and seniority data • Detect companies expanding or shifting their workforce focus • Monitor new roles and emerging skills in real time
Labor Market Research & Workforce Planning • Visualize job market activity across cities, states, and ZIP codes • Analyze hiring velocity and job volume changes as macroeconomic signals • Correlate job demand with company size, sector, or compensation structure • Study occupational dynamics using AI-normalized job titles • Use directional signals (job increases/declines) to anticipate market shifts
HR Analytics & Compensation Intelligence • Map salary ranges and benefits offerings by role, location, and level • Track high-demand or hard-to-fill positions for strategic workforce planning • Support compensation planning and headcount forecasting • Feed job title normalization and metadata into internal HRIS systems • Identify talent clusters and location-based hiring inefficiencies
What Makes This Job Postings Data Unique
AI-Based Enrichment at Scale • Extracted attributes include hard skills, soft skills, certifications, and education requirements • Modeled predictions for seniority level, employment type, and remote/on-site classification • Normalized job titles using an internal taxonomy of over 50,000 unique roles • Field-level tagging ensures structured, filterable, and clean outputs
Salary Parsing & Compensation Insights • Parsed salary ranges directly from job descriptions • AI-based salary predictions for postings without explicit compensation • Compensation patterns available by job title, company, and location
Deduplication & Normalization • Achieves approximately 60% deduplication rate through semantic and metadata matching • Normalizes company names, job titles, location formats, and employment attributes • Ready-to-use, analysis-grade dataset—fully structured and cleansed
Company Matching & Metadata • Each job post is linked to a structured company profile, including metadata • Records are cross-referenced with LinkedIn and Google Maps to validate company identity and geography • Enables aggregation at employer or location level for deeper insights
Freshness & Scalability • Updated hourly to reflect real-time hiring behavior and job market shifts • Delivered in flexible formats (CSV, JSON, or data feed) and customizable filters • Supports segmentation by geography, company, seniority, salary, title, and more
Who Uses Canaria’s Job Postings Data • HR & Talent Teams – to benchmark roles, optimize pipelines, and compete for talent • Consultants & Strategy Teams – to guide clients with labor-driven insights • Market Researchers – to understand employment dynamics and job creation trends • HR Tech & SaaS Platforms – to power salary tools, job market dashboards, or recruiting features • Economic Analysts & Think Tanks – to model labor activity and hiring-based economic trends • BI & Analytics Teams – to build dashboards that track demand, skill shifts, and geographic patterns
Summary Canaria’s Job Postings Data provides an AI-enriched, clean, and analysis-ready view of the U.S. job market. Covering millions of listings from Indeed, LinkedIn, other job boards, and ATS sources, it includes detailed job attributes, inferred compensation, normalized titles, skill extraction, and employer metadata—all updated hourly and fully structured.
With deep enrichment, reliable deduplication, and company matchability, this dataset is purpose-built for users needing workforce insights, market trends, and strategic talent intelligence. Whether you're modeling skill gaps, benchmarking compensation, or visualizing hiring momentum, this dataset provides a complete toolkit for HR and labor intelligence.
About Canaria Inc. ...
Facebook
TwitterThis is a GPS dataset acquired from Google.
Google tracks the user’s device location through Google Maps, which also works on Android devices, the iPhone, and the web. It’s possible to see the Timeline from the user’s settings in the Google Maps app on Android or directly from the Google Timeline Website. It has detailed information such as when an individual is walking, driving, and flying. Such functionality of tracking can be enabled or disabled on demand by the user directly from the smartphone or via the website. Google has a Take Out service where the users can download all their data or select from the Google products they use the data they want to download. The dataset contains 120,847 instances from a period of 9 months or 253 unique days from February 2019 to October 2019 from a single user. The dataset comprises a pair of (latitude, and longitude), and a timestamp. All the data was delivered in a single CSV file. As the locations of this dataset are well known by the researchers, this dataset will be used as ground truth in many mobility studies.
Please cite the following papers in order to use the datasets:
T. Andrade, B. Cancela, and J. Gama, "Discovering locations and habits from human mobility data," Annals of Telecommunications, vol. 75, no. 9, pp. 505–521, 2020. 10.1007/s12243-020-00807-x (DOI)and T. Andrade, B. Cancela, and J. Gama, "From mobility data to habits and common pathways," Expert Systems, vol. 37, no. 6, p. e12627, 2020.10.1111/exsy.12627 (DOI)
Facebook
TwitterThe Google Reviews & Ratings Dataset provides businesses with structured insights into customer sentiment, satisfaction, and trends based on reviews from Google. Unlike broad review datasets, this product is location-specific—businesses provide the locations they want to track, and we retrieve as much historical data as possible, with daily updates moving forward.
This dataset enables businesses to monitor brand reputation, analyze consumer feedback, and enhance decision-making with real-world insights. For deeper analysis, optional AI-driven sentiment analysis and review summaries are available on a weekly, monthly, or yearly basis.
Dataset Highlights
Use Cases
Data Updates & Delivery
Data Fields Include:
Optional Add-Ons:
Ideal for
Why Choose This Dataset?
By leveraging Google Reviews & Ratings Data, businesses can gain valuable insights into customer sentiment, enhance reputation management, and stay ahead of the competition.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for IFEval
Dataset Summary
This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: from datasets import load_dataset
ifeval = load_dataset("google/IFEval")
Supported Tasks and… See the full description on the dataset page: https://huggingface.co/datasets/google/IFEval.
Facebook
TwitterThis dataset contains two tables: creative_stats and removed_creative_stats. The creative_stats table contains information about advertisers that served ads in the European Economic Area or Turkey: their legal name, verification status, disclosed name, and location. It also includes ad specific information: impression ranges per region (including aggregate impressions for the European Economic Area), first shown and last shown dates, which criteria were used in audience selection, the format of the ad, the ad topic and whether the ad is funded by Google Ad Grants program. A link to the ad in the Google Ads Transparency Center is also provided. The removed_creative_stats table contains information about ads that served in the European Economic Area that Google removed: where and why they were removed and per-region information on when they served. The removed_creative_stats table also contains a link to the Google Ads Transparency Center for the removed ad. Data for both tables updates periodically and may be delayed from what appears on the Google Ads Transparency Center website. About BigQuery This data is hosted in Google BigQuery for users to easily query using SQL. Note that to use BigQuery, users must have a Google account and create a GCP project. This public dataset is included in BigQuery's 1TB/mo of free tier processing. Each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . Download Dataset This public dataset is also hosted in Google Cloud Storage here and available free to use. Use this quick start guide to quickly learn how to access public datasets on Google Cloud Storage. We provide the raw data in JSON format, sharded across multiple files to support easier download of the large dataset. A README file which describes the data structure and our Terms of Service (also listed below) is included with the dataset. You can also download the results from a custom query. See here for options and instructions. Signed out users can download the full dataset by using the gCloud CLI. Follow the instructions here to download and install the gCloud CLI. To remove the login requirement, run "$ gcloud config set auth/disable_credentials True" To download the dataset, run "$ gcloud storage cp gs://ads-transparency-center/* . -R" This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
🧠 data_jobs Dataset
A dataset of real-world data analytics job postings from 2023, collected and processed by Luke Barousse.
Background
I've been collecting data on data job postings since 2022. I've been using a bot to scrape the data from Google, which come from a variety of sources. You can find the full dataset at my app datanerd.tech.
Serpapi has kindly supported my work by providing me access to their API. Tell them I sent you and get 20% off paid plans.… See the full description on the dataset page: https://huggingface.co/datasets/lukebarousse/data_jobs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
so if you have to have a G+ account (for YouTube, location services, or other reasons) - here's how you can make it totally private! No one will be able to add you, send you spammy links, or otherwise annoy you. You need to visit the "Audience Settings" page - https://plus.google.com/u/0/settings/audience You can then set a "custom audience" - usually you would use this to restrict your account to people from a specific geographic location, or within a specific age range. In this case, we're going to choose a custom audience of "No-one" Check the box and hit save. Now, when people try to visit your Google+ profile - they'll see this "restricted" message. You can visit my G+ Profile if you want to see this working. (https://plus.google.com/114725651137252000986) If you are not able to understand you can follow this website : http://www.livehuntz.com/google-plus/support-phone-number
Facebook
TwitterDue to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA GLA Covid-19 Mobility Report Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements. The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking. Public Transport For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house: activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Restaurants OpenTable State of the Industry 2022-02-19 London restaurant bookings made through OpenTable 0% 0.17% 0.024% Home Working The Google Mobility Report estimates changes in how many people are staying at home and going to places of work compared to normal. It's difficult to translate this into exact percentages of the population, but changes back towards ‘normal' can be seen to start before any lockdown restrictions were lifted. This value gives a seven day rolling (mean) average to avoid it being distorted by weekends and bank holidays. name Source Latest Baseline Min/max value in Lockdown 1 Min/max value in Lockdown 2 Min/max value in Lockdown 3 Residential Google Mobility Report 2022-10-15 Estimates changes in how many people are staying at home for work. Compared to baseline of 5 weeks from 3 Jan '20 131% 119% 125% Workplaces Google Mobility Report 2022-10-15 Estimates changes in how many people are going to places of work. Compared to baseline of 5 weeks from 3 Jan '20 24% 54% 40% Restriction Date end_date Average Citymapper Average homeworking Work from home advised 17 Mar '20 21 Mar '20 57% 118% Schools, pubs closed 21 Mar '20 24 Mar '20 34% 119% UK enters first lockdown 24 Mar '20 10 May '20 10% 130% Some workers encouraged to return to work 10 May '20 01 Jun '20 15% 125% Schools open, small groups outside 01 Jun '20 15 Jun '20 19% 122% Non-essential businesses re-open 15 Jun '20 04 Jul '20 24% 120% Hospitality reopens 04 Jul '20 03 Aug '20 34% 115% Eat out to help out scheme begins 03 Aug '20 08 Sep '20 44% 113% Rule of 6 08 Sep '20 24 Sep '20 53% 111% 10pm Curfew 24 Sep '20 15 Oct '20 51% 112% Tier 2 (High alert) 15 Oct '20 05 Nov '20 49% 113% Second Lockdown 05 Nov '20 02 Dec '20 31% 118% Tier 2 (High alert) 02 Dec '20 19 Dec '20 45% 115% Tier 4 (Stay at home advised) 19 Dec '20 05 Jan '21 22% 124% Third Lockdown 05 Jan '21 08 Mar '21 22% 122% Roadmap 1 08 Mar '21 29 Mar '21 29% 118% Roadmap 2 29 Mar '21 12 Apr '21 36% 117% Roadmap 3 12 Apr '21 17 May '21 51% 113% Roadmap out of lockdown: Step 3 17 May '21 19 Jul '21 65% 109% Roadmap out of lockdown: Step 4 19 Jul '21 07 Nov '22 68% 107%
Facebook
TwitterIn the U.S. public companies, certain insiders and broker-dealers are required to regularly file with the SEC. The SEC makes this data available online for anybody to view and use via their Electronic Data Gathering, Analysis, and Retrieval (EDGAR) database. The SEC updates this data every quarter going back to January, 2009. To aid analysis a quick summary view of the data has been created that is not available in the original dataset. The quick summary view pulls together signals into a single table that otherwise would have to be joined from multiple tables and enables a more streamlined user experience. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets.Learn more
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.
The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.
This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.
The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.
The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.
The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.
The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.
A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):
· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.
The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk
Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science
Facebook
TwitterAs global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This large dataset (+8 million obs) contains job descriptions and rankings among various criteria such as work-life balance, income, culture, etc.
This data set complements the Glassdoor dataset located [here].(https://www.kaggle.com/datasets/davidgauthier/glassdoor-job-reviews)
Please cite as: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=FQstpaoAAAAJ&citation_for_view=FQstpaoAAAAJ:UebtZRa9Y70C
Glassdoor Reviews Glassdoor produces reports based upon the data collected from its users, on topics including work–life balance, CEO pay ratios, lists of the best office places and cultures, and the accuracy of corporate job searching maxims. Data from Glassdoor has also been used by outside sources to produce estimates on the effects of salary trends and changes on corporate revenues. Glassdoor also puts the conclusions of its research of other companies towards its company policies. In 2015, Tom Lakin produced the first study of Glassdoor in the United Kingdom, concluding that Glassdoor is regarded by users as a more trustworthy source of information than career guides or official company documents.
Features The columns correspond to the date of the review, the job name, the job location, the status of the reviewers, and the reviews. Reviews are divided into sub-categories Career Opportunities, Comp & Benefits, Culture & Values, Senior Management, and Work/Life Balance. In addition, employees can add recommendations on the firm, the CEO, and the outlook.
Other information Ranking for the recommendation of the firm, CEO approval, and outlook are allocated categories v, r, x, and o, with the following meanings: v - Positive, r - Mild, x - Negative, o - No opinion
Some examples of the textual data entries MCDONALD-S I don't like working here,don't work here Headline: I don't like working here,don't work here Pros: Some people are nice,some free food,some of the managers are nice about 95% of the time Cons: 95% of people are mean to employees/customers,its not a clean place,people barely clean their hands of what i see,managers are mean,i got a stress rash because of this i can't get rid of it,they don't give me a little raise even though i do alot of crap there for them Rating: 1.0
KPMG Quit working people to death Headline: Quit working people to death Pros: Lots of PTO, Good company training Cons: long hours, clear disconnect between management and staff, as corporate as it gets Rating: 2.0
PRIMARK Sales assistant Headline: Sales assistant Pros: Lovely staff, managers are also very nice Cons: Hardwork, often rude customers, underpaid for u18 Rating: 3.0
J-P-MORGAN Life in JPM, Bangalore Headline: Life in JPM, Bangalore Pros: Good place to start, lots of opportunity. Cons: Be ready to put in a lot of effort not a place to chill out. Rating: 4.0
VODAFONE Good to be here Headline: Good to be here Pros: Fast moving with technology. Leading Cons: There are areas you may want to avoid Rating: 5.0
Facebook
TwitterMeet Earth EngineGoogle Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONSLEARN MOREGLOBAL-SCALE INSIGHTExplore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.EXPLORE TIMELAPSEREADY-TO-USE DATASETSThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.EXPLORE DATASETSSIMPLE, YET POWERFUL APIThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.EXPLORE THE APIGoogle Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.CONVENIENT TOOLSUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.LEARN ABOUT THE CODE EDITORSCIENTIFIC AND HUMANITARIAN IMPACTScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.SEE CASE STUDIESREADY TO BE PART OF THE SOLUTION?SIGN UP NOWTERMS OF SERVICE PRIVACY ABOUT GOOGLE
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
Methodology
To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.
To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.
Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.
Description of the data in this data set
Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies
The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information
Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}
Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?
Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))
HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term?
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover?
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)
Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx
Licenses or restrictions CC-BY
For more info, see README.txt
Facebook
TwitterThis version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.
The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.
The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.
For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('civil_comments', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Employee-Reviews Description Context Over 67k employee reviews for Google, Amazon, Facebook, Apple, and Microsoft
Content This dataset contains employee reviews separated into the following categories:
Index: index Company: Company name Location : This dataset is global, as such it may include the country's name in parenthesis [i.e "Toronto, ON(Canada)"]. However, if the location is in the USA then it will only include the city and state[i.e "Los Angeles, CA" ] Date Posted: in the following format MM DD, YYYY Job-Title: This string will also include whether the reviewer is a 'Current' or 'Former' Employee at the time of the review Summary: Short summary of employee review Pros: Pros Cons: Cons Overall Rating: 1-5 Work/Life Balance Rating: 1-5 Culture and Values Rating: 1-5 Career Opportunities Rating: 1-5 Comp & Benefits Rating: 1-5 Senior Management Rating: 1-5 Helpful Review Count: A count of how many people found the review to be helpful Link to Review : This will provide you with a direct link to the page that contains the review. However it is likely that this link will be outdated NOTE: 'none' is placed in all cells where no data value was found.
Acknowledgements This data was scraped from Glassdoor
3 Inspiration To inspire people to create ML models to search for meaningful trends within this dataset