This dataset contains current and historical demographic data on Google's workforce since the company began publishing diversity data in 2014. It includes data collected for government reporting and voluntary employee self-identification globally relating to hiring, retention, and representation categorized by race, gender, sexual orientation, gender identity, disability status, and military status. In some instances, the data is limited due to various government policies around the world and the desire to protect Googler confidentiality. All data in this dataset will be updated yearly upon publication of Googleās Diversity Annual Report . Google uses this data to inform its diversity, equity, and inclusion work. More information on our methodology can be found in the Diversity Annual Report. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
The tech industry had a rough start to 2024. Technology companies worldwide saw a significant reduction in their workforce in the first quarter of 2024, with over 57 thousand employees being laid off. By the second quarter, layoffs impacted more than 43 thousand tech employees. In the final quarter of the year around 12 thousand employees were laid off. Layoffs impacting all global tech giants Layoffs in the global market escalated dramatically in the first quarter of 2023, when the sector saw a staggering record high of 167.6 thousand employees losing their jobs. Major tech giants such as Google, Microsoft, Meta, and IBM all contributed to this figure during this quarter. Amazon, in particular, conducted the most rounds of layoffs with the highest number of employees laid off among global tech giants. Industries most affected include theāÆconsumer, hardware, food, and healthcare sectors. Notable companies that have laid off a significant number of staff include Flink, Booking.com, Uber, PayPal, LinkedIn, and Peloton, among others. Overhiring led the trend, but will AI keep it going? Layoffs in the technology sector started following an overhiring spree during the COVID-19 pandemic. Initially, companies expanded their workforce to meet increased demand for digital services during lockdowns. However, as lockdowns ended, economic uncertainties persisted and companies reevaluated their strategies, layoffs became inevitable, resulting in a record number of 263 thousand laid off employees in the global tech sector by trhe end of 2022. Moreover, it is still unclear how advancements in artificial intelligence (AI) will impact layoff trends in the tech sector. AI-driven automation can replace manual tasks leading to workforce redundancies. Whether through chatbots handling customer inquiries or predictive algorithms optimizing supply chains, the pursuit of efficiency and cost savings may result in more tech industry layoffs in the future.
OnPoint Weather is a global weather dataset for business available for any lat/lon point and geographic area such as ZIP codes. OnPoint Weather provides a continuum of hourly and daily weather from the year 2000 to current time and a forward forecast of 45 days. OnPoint Climatology provides hourly and daily weather statistics which can be used to determine ādepartures from normalā and to provide climatological guidance of expected weather for any location at any point in time. The OnPoint Climatology provides weather statistics such as means, standard deviations and frequency of occurrence. Weather has a significant impact on businesses and accounts for hundreds of billions in lost revenue annually. OnPoint Weather allows businesses to quantify weather impacts and develop strategies to optimize for weather to improve business performance. Examples of Usage Quantify the impact of weather on sales across diverse locations and times of the year Understand how supply chains are impacted by weather Understand how employeeās attendance and performance are impacted by weather Understand how weather influences foot traffic at malls, stores and restaurants OnPoint Weather is available through Google Cloud Platformās Commercial Dataset Program and can be easily integrated with other Google Cloud Platform Services to quickly reveal and quantify weather impacts on business. Weather Source provides a full range of support services from answering quick questions to consulting and building custom solutions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery ē解詳ę
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to
establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data
Approach
The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search methods
We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of āagricultural dataā /āag dataā / āscientific dataā + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects.
We then used search engines such as Bing and Google to find top agricultural university repositories using variations of āagricultureā, āag dataā and āuniversityā to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University ā International Research Institute for Climate and Society, UC Davis ā Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories.
Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo.
Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories.
Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals.
Evaluation
We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results.
We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind.
Results
A summary of the major findings from our data review:
Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors.
There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.
See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Geospatial Services Land management within the US Forest Service and on the 900,000+ acre Monongahela National Forest (NF) is driven by a wide mix of resource and societal demands that prove a challenge in fulfilling the Forest Serviceās mission of āCaring for the Land and Serving the People.ā Programmatically, the 2006 Land and Resource Management Plan guide natural resource management activities on lands administered by the Monongahela National Forest. The Forest Plan describes management direction and practices, resource protection methods and monitoring, desired resource conditions, and the availability and suitability of lands for resource management. Technology enables staff to address these land management issues and Forest Plan direction by using a science-based approach to facilitate effective decisions. Monongahela NF geospatial services, using enabling-technologies, incorporate key tools such as Environmental Systems Research Instituteās ArcGIS desktop suite and Trimbleās global positioning system (GPS) units to meet program and Forest needs. Geospatial Datasets The Forest has a broad set of geospatial datasets that capture geographic features across the eastern West Virginia landscape. Many of these datasets are available to the public through our download site. Selected geospatial data that encompass the Monongahela National Forest are available for download from this page. A link to the FGDC-compliant metadata is provided for each dataset. All data are in zipped format (or available from the specified source), in one of two spatial data formats, and in the following coordinate system: Coordinate System: Universal Transverse Mercator Zone: 17 Units: Meters Datum: NAD 1983 Spheroid: GRS 1980 Map files ā All map files are in pdf format. These maps illustrate the correlated geospatial data. All maps are under 1 MB unless otherwise noted. Metadata file ā This FGDC-compliant metadata file contains information pertaining to the specific geospatial dataset. Shapefile ā This downloadable zipped file is in ESRIās shapefile format. KML file ā This downloadable zipped file is in Google Earthās KML format. Resources in this dataset:Resource Title: Monongahela National Forest Geospatial Data. File Name: Web Page, url: https://www.fs.usda.gov/detail/mnf/landmanagement/gis/?cid=stelprdb5108081 Selected geospatial data that encompass the Monongahela National Forest are available for download from this page.
Business Task:
Analyze Cyclistic historical bike trip data to identify trends that explain how annual members and casual riders differ. Transform data into actionable insights and create compelling data visualizations that explain why casual riders should purchase an annual membership. Design a new marketing strategy to convert causal riders into annual members. Use digital media to create effective marketing targeted towards casual riders realizing why it would be beneficial to become an annual member.
Key stakeholders to be considered are Cyclistic customers, Lily Moreno, the Cyclistic marketing analytics team, as well as the Cyclistic executive team. Cyclistic customers include casual riders and members, some with disabilities that use assistive options. Only 30% of riders use Cyclistic to commute to work, while most riders use the bike-share service for leisure. Lily Moreno is the director of marketing. The marketing analytics team helps guide the marketing strategy. The executive team decides whether to approve the recommended marketing program.
A description of all data sources used:
Cyclistic bike-share historical trip data is public. It is located on the Divvy website. The .CSV files are sorted by year and month, dating back to 2013. The data is not in real-time, but it is current because it is published every month. Each file has comprehensive data on individual rider IDās, bike type, time & date of trip, station location information, and whether each rider is a casual rider or a member.
The Divvy website includes the following system data:
Each trip is anonymized and includes: ⢠Trip start day and time ⢠Trip end day and time ⢠Trip start station ⢠Trip end station ⢠Rider type (Member, Single Ride, and Day Pass) The data has been filtered to remove trips that are taken by staff as they service and inspect the system; and any trips that were below 60 seconds in length (potentially false starts or users trying to re-dock a bike to ensure it was secure).
The Data License Agreement explains that Motivate International Inc. (āMotivateā) operates the City of Chicagoās (āCityā) Divvy bike-share service. The City of Chicago is the owner of all Divvy data and makes it accessible to the public. Lyft is the operator of Divvy in Chicago. Lyft has a privacy policy that explains their commitment to respecting our personal information.
The Divvy Data License Agreement explains the following:
⢠License. Motivate hereby grants to you a non-exclusive, royalty-free, limited, perpetual license to access, reproduce, analyze, copy, modify, distribute in your product or service and use the Data for any lawful purpose (āLicenseā).
⢠No Warranty. THE DATA IS PROVIDED āAS IS,ā AS AVAILABLE (AT MOTIVATEāS SOLE DISCRETION) AND AT YOUR SOLE RISK. TO THE MAXIMUM EXTENT PROVIDED BY LAW MOTIVATE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. MOTIVATE FURTHER DISCLAIMS ANY WARRANTY THAT THE DATA WILL MEET YOUR NEEDS OR WILL BE OR CONTINUE TO BE AVAILABLE, COMPLETE, ACCURATE, TIMELY, SECURE, OR ERROR FREE.
In contrast to all Divvy system data being reliable, the āNo Warrantyā terms and conditions make it so that there is no guarantee if the data will be āAVAILABLE, COMPLETE, ACCURATE, TIMELY, SECURE, OR ERROR FREE.ā The credibility of the data could potentially be negatively affected if they are not held responsible.
Sampling bias could take place because Chicago is significantly affected by weather. There is also an influx of tourists at certain times of the year. Weather and tourismās effect on data can be accounted for because these influences are constant.
Divvy bike-share consistently providing accurate data is necessary to create and follow through with an effective marketing strategy. All the data is original and owned by the City of Chicago making it a credible source. Lyft is also a credible source because they have the technology to accurately collect data. Although the Data License Agreement states that it has āNo Warranty,ā the source of the data and the way it is managed makes it credible. Divvy bike-share data is cited using the following:
⢠Divvy (https://www.divvybikes.com) ⢠Divvy Historical Data (https://divvy-tripdata.s3.amazonaws.com/index.html) ⢠Divvy System Data (https://www.divvybikes.com/system-data) ⢠Divvy Data License Agreement (https://www.divvybikes.com/data-license-agreement) ⢠Lyftās Privacy Policy (https://www.lyft.com/privacy)
The sources of the data confirm data credibility. The data is detailed and thorough making it effective and efficient for marketing purposes.
Documentation of any cleaning or manipulation of data:
Format Cells --> Alignment --> Shrink to Fit top row
Data --> Remove Duplicates
Create and calculate n...
š LinkedIn Company Data for Company Analysis, Valuation & Portfolio Strategy LinkedIn company data is one of the most powerful forms of alternative data for understanding company behavior, firmographics, business dynamics, and real-time hiring signals. Canariaās enriched LinkedIn company data provides detailed company profiles, including hiring activity, job postings, employee trends, headquarters and branch locations, and verified metadata from Google Maps. This LinkedIn corporate data is updated weekly and optimized for use in company analysis, startup scouting, private company valuation, and investment monitoring. It supports BI dashboards, risk models, CRM enrichment, and portfolio strategy.
š§ Use Cases: What Problems This LinkedIn Data Solves Our LinkedIn company insights transform opaque business landscapes into structured, analyzable data. Whether youāre conducting M&A due diligence, tracking high-growth companies, or benchmarking performance, this dataset empowers fast, confident decisions.
š Company Analysis ⢠Identify a companyās size, industry classification, and headcount signals using LinkedIn firmographic data ⢠Analyze social presence through LinkedIn follower metrics and employee engagement ⢠Understand geographic expansion through branch locations and hiring distribution ⢠Benchmark companies using LinkedIn profile activity and job posting history ⢠Monitor business changes with real-time LinkedIn updates
š Company Valuation & Financial Benchmarking ⢠Feed LinkedIn-based firmographics into comps and financial models ⢠Use hiring velocity from LinkedIn job data as a proxy for business growth ⢠Strengthen private market intelligence with verified non-financial signals ⢠Validate scale, structure, and presence via LinkedIn and Google Maps footprint
ā ļø Company Risk Analysis ⢠Detect red flags using hiring freezes or drop in profile activity ⢠Spot market shifts through location downsizing or organizational changes ⢠Identify distressed companies with decreased LinkedIn job posting frequency ⢠Compare stated presence vs. active behavior to identify risk anomalies
š Business Intelligence (BI) & Strategic Planning ⢠Segment companies by industry, headcount, growth behavior, and hiring activity ⢠Build BI dashboards integrating LinkedIn job trends and firmographic segmentation ⢠Identify geographic hiring hotspots using Maps and LinkedIn signal overlays ⢠Track job creation, title distribution, and skill demand in near real-time ⢠Export filtered LinkedIn corporate data into CRMs, analytics tools, and lead scoring systems
š Portfolio Management & Investment Monitoring ⢠Enhance portfolio tracking with LinkedIn hiring data and firmographic enrichment ⢠Spot hiring surges, geographic expansions, or restructuring in real-time ⢠Correlate LinkedIn growth indicators with strategic outcomes ⢠Analyze competitors and targets using historical and real-time LinkedIn data ⢠Generate alerts for high-impact company changes in your portfolio universe
š What Makes This LinkedIn Company Data Unique
š§ Includes Real-Time Hiring Signals ⢠Gain visibility into which companies are hiring, at what scale, and for which roles using enriched LinkedIn job data
š Verified Location Intelligence ⢠Confirm branch and HQ locations with Google Maps coordinates and public company metadata
š Weekly Updates ⢠Stay ahead of the market with fresh, continuously updated LinkedIn company insights
š Clean & Analysis-Ready Format ⢠Structured, deduplicated, and taxonomy-mapped data that integrates with CRMs, BI platforms, and investment models
šÆ Who Benefits from LinkedIn Company Data ⢠Hedge funds, VCs, and PE firms analyzing startup and private company activity ⢠Portfolio managers and financial analysts tracking operational shifts ⢠Market research firms modeling sector momentum and firmographics ⢠Strategy teams calculating market size using LinkedIn company footprints ⢠BI and analytics teams building company-level dashboards ⢠Compliance and KYC teams enriching company identity records ⢠Corp dev teams scouting LinkedIn acquisition targets and expansion signals
š Summary Canariaās LinkedIn company data delivers high-frequency, high-quality insights into U.S. companies, combining job posting trends, location data, and firmographic intelligence. With real-time updates and structured delivery formats, this alternative dataset enables powerful workflows across company analysis, financial modeling, investment research, market segmentation, and business strategy.
š¢ About Canaria Inc. Canaria Inc. is a leader in alternative data, specializing in job market intelligence, LinkedIn company data, and Glassdoor salary analytics. We deliver clean, structured, and enriched datasets at scale using proprietary data scraping pipelines and advanced AI/LLM-based modeling, all backed by human validation. Our AI-powered pipeline is developed by a seasoned team of machine learning experts from Google, Meta, and Amazon, and by alumni of S...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset contains current and historical demographic data on Google's workforce since the company began publishing diversity data in 2014. It includes data collected for government reporting and voluntary employee self-identification globally relating to hiring, retention, and representation categorized by race, gender, sexual orientation, gender identity, disability status, and military status. In some instances, the data is limited due to various government policies around the world and the desire to protect Googler confidentiality. All data in this dataset will be updated yearly upon publication of Googleās Diversity Annual Report . Google uses this data to inform its diversity, equity, and inclusion work. More information on our methodology can be found in the Diversity Annual Report. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .