Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about companies. It has 202 rows and is filtered where the company is Google. It features 30 columns including city, country, employees, and employee type.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This project uses a dataset called HR_capstone_dataset.csv. It represents 10 columns of self-reported information from employees of a fictitious multinational vehicle manufacturing corporation.
The dataset contains:
14,999 rows – each row is a different employee’s self-reported information
This dataset has as its primary data source, the Kaggle dataset:
-HR Analytics Job Prediction (CC0: Public Domain, made available by Faisal Qureshi) - Link: (https://www.kaggle.com/datasets/mfaisalqureshi/hr-analytics-and-job-prediction/data)
As of January 2024, several major technology companies, including Google, Amazon, Meta, and Apple, have implemented return-to-office mandates requiring employees to be in the office at least three days per week. Interestingly, Zoom, a company that played a significant role in facilitating work-from-home activities during the COVID-19 pandemic, has announced a return-to-office mandate of its own requiring employees to work from the office twice per week. In contrast, X (formerly Twitter) adopted an office-only policy for their employees since Elon Musk acquired Twitter in 2022, requiring all X employees to work from the office the entire work week.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute.
The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021.This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license.
This dataset comprises one spreadsheet with N=91 anonymised survey responses .xslx format. It includes all responses to the project survey which used Google Forms between 06-Feb-2023 and 30-May-2023. The spreadsheet can be opened with Microsoft Excel, Google Sheet, or open-source equivalents.
The survey responses include a random sample of researchers worldwide undertaking qualitative, mixed-methods, or multi-modal research.
The recruitment of respondents was initially purposive, aiming to gather responses from qualitative researchers at research-intensive (targetted Russell Group) Universities. This involved speculative emails and a call for participant on the University of Sheffield ‘Qualitative Open Research Network’ mailing list. As result, the responses include a snowball sample of scholars from elsewhere.
The spreadsheet has two tabs/sheets: one labelled ‘SurveyResponses’ contains the anonymised and tidied set of survey responses; the other, labelled ‘VariableMapping’, sets out each field/column in the ‘SurveyResponses’ tab/sheet against the original survey questions and responses it relates to.
The survey responses tab/sheet includes a field/column labelled ‘RespondentID’ (using randomly generated 16-digit alphanumeric keys) which can be used to connect survey responses to interview participants in the accompanying ‘Fostering cultures of open qualitative research: Dataset 2 – Interview transcripts’ files.
A set of survey questions gathering eligibility criteria detail and consent are not listed with in this dataset, as below. All responses provide in the dataset gained a ‘Yes’ response to all the below questions (with the exception of one question, marked with an asterisk (*) below):
· I am aged 18 or over · I have read the information and consent statement and above. · I understand how to ask questions and/or raise a query or concern about the survey. · I agree to take part in the research and for my responses to be part of an open access dataset. These will be anonymised unless I specifically ask to be named. · I understand that my participation does not create a legally binding agreement or employment relationship with the University of Sheffield · I understand that I can withdraw from the research at any time. · I assign the copyright I hold in materials generated as part of this project to The University of Sheffield. · * I am happy to be contacted after the survey to take part in an interview.
The project was undertaken by two staff: Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk
Postdoctoral Research Assistant Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science
This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.
The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.
The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.
For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('civil_comments', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
The study was conducted using convenience sampling around 325 employees who worked online during the COVID-19 pandemic. This included public and private sector offices operating online. Online employees in Sri Lanka were randomly selected and surveyed about their preferences for working online. Responses were gathered through a questionnaire via Google Forms. The questionnaire mainly consists of four separate sections, 1-personal information such as age, gender, marital status, etc. 2-working information such as type of working place, employment category, etc. 3-usage of technology such as internet usage, and issues they are facing. 4-preferences for online work, reasons for their preference to continue this teleworking concept, the idea of working online, and other major problems related to online work.
Target variable - Do you like to continue work from home concept after the Covid -19 pandemic? Yes/No
Publications: 1 - S. A. D. D. Abesiri, R. A. H. M. Rupasingha, An Ensemble Learning Approach to Predict Employees' Choice for Continuing E-working Concept in the Post-Pandemic World, The IUP Journal of Information Technology, Vol. 18, Issue. 4, 2022. 2 - S. A. D. D. Abesiri and R. A. H. M. Rupasingha, "Predicting Employee Preference of Teleworking Using Machine Learning Techniques in the Post COVID-19 Period in Sri Lanka," 2022 International Research Conference on Smart Computing and Systems Engineering (SCSE), IEEE, 2022, pp. 22-27, doi: 10.1109/SCSE56529.2022.9905131
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As there was no large publicly available cross-domain dataset for comparative argument mining, we create one composed of sentences, potentially annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object. We aim for minimizing dataset domain-specific biases in order to capture the nature of comparison and not the nature of the particular domains, thus decided to control the specificity of domains by the selection of comparison targets. We hypothesized and could confirm in preliminary experiments that comparison targets usually have a common hypernym (i.e., are instances of the same class), which we utilized for selection of the compared objects pairs. The most specific domain we choose, is computer science with comparison targets like programming languages, database products and technology standards such as Bluetooth or Ethernet. Many computer science concepts can be compared objectively (e.g., on transmission speed or suitability for certain applications). The objects for this domain were manually extracted from List of-articles at Wikipedia. In the annotation process, annotators were asked to only label sentences from this domain if they had some basic knowledge in computer science. The second, broader domain is brands. It contains objects of different types (e.g., cars, electronics, and food). As brands are present in everyday life, anyone should be able to label the majority of sentences containing well-known brands such as Coca-Cola or Mercedes. Again, targets for this domain were manually extracted from `List of''-articles at Wikipedia.The third domain is not restricted to any topic: random. For each of 24~randomly selected seed words 10 similar words were collected based on the distributional similarity API of JoBimText (http://www.jobimtext.org). Seed words created using randomlists.com: book, car, carpenter, cellphone, Christmas, coffee, cork, Florida, hamster, hiking, Hoover, Metallica, NBC, Netflix, ninja, pencil, salad, soccer, Starbucks, sword, Tolkien, wine, wood, XBox, Yale.Especially for brands and computer science, the resulting object lists were large (4493 in brands and 1339 in computer science). In a manual inspection, low-frequency and ambiguous objects were removed from all object lists (e.g., RAID (a hardware concept) and Unity (a game engine) are also regularly used nouns). The remaining objects were combined to pairs. For each object type (seed Wikipedia list page or the seed word), all possible combinations were created. These pairs were then used to find sentences containing both objects. The aforementioned approaches to selecting compared objects pairs tend minimize inclusion of the domain specific data, but do not solve the problem fully though. We keep open a question of extending dataset with diverse object pairs including abstract concepts for future work. As for the sentence mining, we used the publicly available index of dependency-parsed sentences from the Common Crawl corpus containing over 14 billion English sentences filtered for duplicates. This index was queried for sentences containing both objects of each pair. For 90% of the pairs, we also added comparative cue words (better, easier, faster, nicer, wiser, cooler, decent, safer, superior, solid, terrific, worse, harder, slower, poorly, uglier, poorer, lousy, nastier, inferior, mediocre) to the query in order to bias the selection towards comparisons but at the same time admit comparisons that do not contain any of the anticipated cues. This was necessary as a random sampling would have resulted in only a very tiny fraction of comparisons. Note that even sentences containing a cue word do not necessarily express a comparison between the desired targets (dog vs. cat: He's the best pet that you can get, better than a dog or cat.). It is thus especially crucial to enable a classifier to learn not to rely on the existence of clue words only (very likely in a random sample of sentences with very few comparisons). For our corpus, we keep pairs with at least 100 retrieved sentences.From all sentences of those pairs, 2500 for each category were randomly sampled as candidates for a crowdsourced annotation that we conducted on figure-eight.com in several small batches. Each sentence was annotated by at least five trusted workers. We ranked annotations by confidence, which is the figure-eight internal measure of combining annotator trust and voting, and discarded annotations with a confidence below 50%. Of all annotated items, 71% received unanimous votes and for over 85% at least 4 out of 5 workers agreed -- rendering the collection procedure aimed at ease of annotation successful.The final dataset contains 7199 sentences with 271 distinct object pairs. The majority of sentences (over 72%) are non-comparative despite biasing the selection with cue words; in 70% of the comparative sentences, the favored target is named first.You can browse though the data here: https://docs.google.com/spreadsheets/d/1U8i6EU9GUKmHdPnfwXEuBxi0h3aiRCLPRC-3c9ROiOE/edit?usp=sharing Full description of the dataset is available in the workshop paper at ACL 2019 conference. Please cite this paper if you use the data: Franzek, Mirco, Alexander Panchenko, and Chris Biemann. ""Categorization of Comparative Sentences for Argument Mining."" arXiv preprint arXiv:1809.06152 (2018).@inproceedings{franzek2018categorization, title={Categorization of Comparative Sentences for Argument Mining}, author={Panchenko, Alexander and Bondarenko, and Franzek, Mirco and Hagen, Matthias and Biemann, Chris}, booktitle={Proceedings of the 6th Workshop on Argument Mining at ACL'2019}, year={2019}, address={Florence, Italy}}
The tech industry had a rough start to 2024. Technology companies worldwide saw a significant reduction in their workforce in the first quarter of 2024, with over ** thousand employees being laid off. By the second quarter, layoffs impacted more than ** thousand tech employees. In the final quarter of the year around ** thousand employees were laid off. Layoffs impacting all global tech giants Layoffs in the global market escalated dramatically in the first quarter of 2023, when the sector saw a staggering record high of ***** thousand employees losing their jobs. Major tech giants such as Google, Microsoft, Meta, and IBM all contributed to this figure during this quarter. Amazon, in particular, conducted the most rounds of layoffs with the highest number of employees laid off among global tech giants. Industries most affected include the consumer, hardware, food, and healthcare sectors. Notable companies that have laid off a significant number of staff include Flink, Booking.com, Uber, PayPal, LinkedIn, and Peloton, among others. Overhiring led the trend, but will AI keep it going? Layoffs in the technology sector started following an overhiring spree during the COVID-19 pandemic. Initially, companies expanded their workforce to meet increased demand for digital services during lockdowns. However, as lockdowns ended, economic uncertainties persisted and companies reevaluated their strategies, layoffs became inevitable, resulting in a record number of *** thousand laid off employees in the global tech sector by the end of 2022. Moreover, it is still unclear how advancements in artificial intelligence (AI) will impact layoff trends in the tech sector. AI-driven automation can replace manual tasks leading to workforce redundancies. Whether through chatbots handling customer inquiries or predictive algorithms optimizing supply chains, the pursuit of efficiency and cost savings may result in more tech industry layoffs in the future.
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
LinkedIn Job Postings Data - Comprehensive Professional Intelligence for HR Strategy & Market Research
LinkedIn Job Postings Data represents the most comprehensive professional intelligence dataset available, delivering structured insights across millions of LinkedIn job postings, LinkedIn job listings, and LinkedIn career opportunities. Canaria's enriched LinkedIn Job Postings Data transforms raw LinkedIn job market information into actionable business intelligence—normalized, deduplicated, and enhanced with AI-powered enrichment for deep workforce analytics, talent acquisition, and market research.
This premium LinkedIn job postings dataset is engineered to help HR professionals, recruiters, analysts, and business strategists answer mission-critical questions: • What LinkedIn job opportunities are available in target companies? • Which skills are trending in LinkedIn job postings across specific industries? • How are companies advertising their LinkedIn career opportunities? • What are the salary expectations across different LinkedIn job listings and regions?
With real-time updates and comprehensive LinkedIn job posting enrichment, our data provides unparalleled visibility into LinkedIn job market trends, hiring patterns, and workforce dynamics.
Use Cases: What This LinkedIn Job Postings Data Solves
Our dataset transforms LinkedIn job advertisements, market information, and career listings into structured, analyzable insights—powering everything from talent acquisition to competitive intelligence and job market research.
Talent Acquisition & LinkedIn Recruiting Intelligence • LinkedIn job market mapping • LinkedIn career opportunity intelligence • LinkedIn job posting competitive analysis • LinkedIn job skills gap identification
HR Strategy & Workforce Analytics • Organizational network analysis • Employee mobility tracking • Compensation benchmarking • Diversity & inclusion analytics • Workforce planning intelligence • Skills evolution monitoring
Market Research & Competitive Intelligence • Company growth analysis • Industry trend identification • Competitive talent mapping • Market entry intelligence • Partnership & business development • Investment due diligence
LinkedIn Job Market Research & Economic Analysis • Regional LinkedIn job analysis • LinkedIn job skills demand forecasting • LinkedIn job economic impact assessment • LinkedIn job education-industry alignment • LinkedIn remote job trend analysis • LinkedIn career development ROI
What Makes This LinkedIn Job Postings Data Unique
AI-Enhanced LinkedIn Job Intelligence • LinkedIn job posting enrichment with advanced NLP • LinkedIn job seniority classification • LinkedIn job industry expertise mapping • LinkedIn job career progression modeling
Comprehensive LinkedIn Job Market Intelligence • Real-time LinkedIn job postings with salary, requirements, and company insights • LinkedIn recruiting activity tracking • LinkedIn job application analytics • LinkedIn job skills demand analysis • LinkedIn compensation intelligence
Company & Organizational Intelligence • Company growth indicators • Cultural & values intelligence • Competitive positioning
LinkedIn Job Data Quality & Normalization • Advanced LinkedIn job deduplication • LinkedIn job skills taxonomy standardization • LinkedIn job geographic normalization • LinkedIn job company matching • LinkedIn job education standardization
Who Uses Canaria's LinkedIn Data
HR & Talent Acquisition Teams • Optimize recruiting pipelines • Benchmark compensation • Identify talent pools • Develop data-driven hiring strategies
Market Research & Intelligence Analysts • Track industry trends • Build competitive intelligence models • Analyze workforce dynamics
HR Technology & Analytics Platforms • Power recruiting tools and analytics solutions • Fuel compensation engines and dashboards
Academic & Economic Researchers • Study labor market dynamics • Analyze career mobility trends • Research professional development
Government & Policy Organizations • Evaluate workforce development programs • Monitor skills gaps • Inform economic initiatives
Summary
Canaria's LinkedIn Job Postings Data delivers the most comprehensive LinkedIn job market intelligence available. It combines job posting insights, recruiting intelligence, and organizational data in one unified dataset. With AI-enhanced enrichment, real-time updates, and enterprise-grade data quality, it supports advanced HR analytics, talent acquisition, job market research, and competitive intelligence.
About Canaria Inc. Canaria Inc. is a leader in alternative data, specializing in job market intelligence, LinkedIn company data, Glassdoor salary analytics, and Google Maps location insights. We deliver clean, structured, and enriched datasets at scale using proprietary data scraping pipelines and advanced AI/LLM-based modeling, all backed by human validation. Our platform also includes Google Maps data, providing verified business location intelligen...
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset was created and deposited onto the University of Sheffield Online Research Data repository (ORDA) on 23-Jun-2023 by Dr. Matthew S. Hanchard, Research Associate at the University of Sheffield iHuman Institute. The dataset forms part of three outputs from a project titled ‘Fostering cultures of open qualitative research’ which ran from January 2023 to June 2023:
· Fostering cultures of open qualitative research: Dataset 1 – Survey Responses · Fostering cultures of open qualitative research: Dataset 2 – Interview Transcripts · Fostering cultures of open qualitative research: Dataset 3 – Coding Book
The project was funded with £13,913.85 of Research England monies held internally by the University of Sheffield - as part of their ‘Enhancing Research Cultures’ scheme 2022-2023.
The dataset aligns with ethical approval granted by the University of Sheffield School of Sociological Studies Research Ethics Committee (ref: 051118) on 23-Jan-2021. This includes due concern for participant anonymity and data management.
ORDA has full permission to store this dataset and to make it open access for public re-use on the basis that no commercial gain will be made form reuse. It has been deposited under a CC-BY-NC license. Overall, this dataset comprises:
· 15 x Interview transcripts - in .docx file format which can be opened with Microsoft Word, Google Doc, or an open-source equivalent.
All participants have read and approved their transcripts and have had an opportunity to retract details should they wish to do so.
Participants chose whether to be pseudonymised or named directly. The pseudonym can be used to identify individual participant responses in the qualitative coding held within the ‘Fostering cultures of open qualitative research: Dataset 3 – Coding Book’ files.
For recruitment, 14 x participants we selected based on their responses to the project survey., whilst one participant was recruited based on specific expertise.
· 1 x Participant sheet – in .csv format which may by opened with Microsoft Excel, Google Sheet, or an open-source equivalent.
The provides socio-demographic detail on each participant alongside their main field of research and career stage. It includes a RespondentID field/column which can be used to connect interview participants with their responses to the survey questions in the accompanying ‘Fostering cultures of open qualitative research: Dataset 1 – Survey Responses’ files.
The project was undertaken by two staff:
Co-investigator: Dr. Itzel San Roman Pineda ORCiD ID: 0000-0002-3785-8057 i.sanromanpineda@sheffield.ac.uk Postdoctoral Research Assistant Labelled as ‘Researcher 1’ throughout the dataset
Principal Investigator (corresponding dataset author): Dr. Matthew Hanchard ORCiD ID: 0000-0003-2460-8638 m.s.hanchard@sheffield.ac.uk Research Associate iHuman Institute, Social Research Institutes, Faculty of Social Science Labelled as ‘Researcher 2’ throughout the dataset
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Due to changes in the collection and availability of data on COVID-19, this website will no longer be updated. The webpage will no longer be available as of 11 May 2023. On-going, reliable sources of data for COVID-19 are available via the COVID-19 dashboard and the UKHSA GLA Covid-19 Mobility Report Since March 2020, London has seen many different levels of restrictions - including three separate lockdowns and many other tiers/levels of restrictions, as well as easing of restrictions and even measures to actively encourage people to go to work, their high streets and local restaurants. This reports gathers data from a number of sources, including google, apple, citymapper, purple wifi and opentable to assess the extent to which these levels of restrictions have translated to a reductions in Londoners' movements. The data behind the charts below come from different sources. None of these data represent a direct measure of how well people are adhering to the lockdown rules - nor do they provide an exhaustive data set. Rather, they are measures of different aspects of mobility, which together, offer an overall impression of how people Londoners are moving around the capital. The information is broken down by use of public transport, pedestrian activity, retail and leisure, and homeworking. Public Transport For the transport measures, we have included data from google, Apple, CityMapper and Transport for London. They measure different aspects of public transport usage - depending on the data source. Each of the lines in the chart below represents a percentage of a pre-pandemic baseline. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Citymapper Citymapper mobility index 2021-09-05 Compares trips planned and trips taken within its app to a baseline of the four weeks from 6 Jan 2020 7.9% 28% 19% Google Google Mobility Report 2022-10-15 Location data shared by users of Android smartphones, compared time and duration of visits to locations to the median values on the same day of the week in the five weeks from 3 Jan 2020 20.4% 40% 27% TfL Bus Transport for London 2022-10-30 Bus journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 34% 24% TfL Tube Transport for London 2022-10-30 Tube journey ‘taps' on the TfL network compared to same day of the week in four weeks starting 13 Jan 2020 - 30% 21% Pedestrian activity With the data we currently have it's harder to estimate pedestrian activity and high street busyness. A few indicators can give us information on how people are making trips out of the house: activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Walking Apple Mobility Index 2021-11-09 estimates the frequency of trips made on foot compared to baselie of 13 Jan '20 22% 47% 36% Parks Google Mobility Report 2022-10-15 Frequency of trips to parks. Changes in the weather mean this varies a lot. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail & Rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 30% 55% 41% Retail and recreation In this section, we focus on estimated footfall to shops, restaurants, cafes, shopping centres and so on. activity Source Latest Baseline Min value in Lockdown 1 Min value in Lockdown 2 Min value in Lockdown 3 Grocery/pharmacy Google Mobility Report 2022-10-15 Estimates frequency of trips to grovery shops and pharmacies. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Retail/rec Google Mobility Report 2022-10-15 Estimates frequency of trips to shops/leisure locations. Compared to baseline of 5 weeks from 3 Jan '20 32% 55.00% 45.000% Restaurants OpenTable State of the Industry 2022-02-19 London restaurant bookings made through OpenTable 0% 0.17% 0.024% Home Working The Google Mobility Report estimates changes in how many people are staying at home and going to places of work compared to normal. It's difficult to translate this into exact percentages of the population, but changes back towards ‘normal' can be seen to start before any lockdown restrictions were lifted. This value gives a seven day rolling (mean) average to avoid it being distorted by weekends and bank holidays. name Source Latest Baseline Min/max value in Lockdown 1 Min/max value in Lockdown 2 Min/max value in Lockdown 3 Residential Google Mobility Report 2022-10-15 Estimates changes in how many people are staying at home for work. Compared to baseline of 5 weeks from 3 Jan '20 131% 119% 125% Workplaces Google Mobility Report 2022-10-15 Estimates changes in how many people are going to places of work. Compared to baseline of 5 weeks from 3 Jan '20 24% 54% 40% Restriction Date end_date Average Citymapper Average homeworking Work from home advised 17 Mar '20 21 Mar '20 57% 118% Schools, pubs closed 21 Mar '20 24 Mar '20 34% 119% UK enters first lockdown 24 Mar '20 10 May '20 10% 130% Some workers encouraged to return to work 10 May '20 01 Jun '20 15% 125% Schools open, small groups outside 01 Jun '20 15 Jun '20 19% 122% Non-essential businesses re-open 15 Jun '20 04 Jul '20 24% 120% Hospitality reopens 04 Jul '20 03 Aug '20 34% 115% Eat out to help out scheme begins 03 Aug '20 08 Sep '20 44% 113% Rule of 6 08 Sep '20 24 Sep '20 53% 111% 10pm Curfew 24 Sep '20 15 Oct '20 51% 112% Tier 2 (High alert) 15 Oct '20 05 Nov '20 49% 113% Second Lockdown 05 Nov '20 02 Dec '20 31% 118% Tier 2 (High alert) 02 Dec '20 19 Dec '20 45% 115% Tier 4 (Stay at home advised) 19 Dec '20 05 Jan '21 22% 124% Third Lockdown 05 Jan '21 08 Mar '21 22% 122% Roadmap 1 08 Mar '21 29 Mar '21 29% 118% Roadmap 2 29 Mar '21 12 Apr '21 36% 117% Roadmap 3 12 Apr '21 17 May '21 51% 113% Roadmap out of lockdown: Step 3 17 May '21 19 Jul '21 65% 109% Roadmap out of lockdown: Step 4 19 Jul '21 07 Nov '22 68% 107%
OnPoint Weather is a global weather dataset for business available for any lat/lon point and geographic area such as ZIP codes. OnPoint Weather provides a continuum of hourly and daily weather from the year 2000 to current time and a forward forecast of 45 days. OnPoint Climatology provides hourly and daily weather statistics which can be used to determine ‘departures from normal’ and to provide climatological guidance of expected weather for any location at any point in time. The OnPoint Climatology provides weather statistics such as means, standard deviations and frequency of occurrence. Weather has a significant impact on businesses and accounts for hundreds of billions in lost revenue annually. OnPoint Weather allows businesses to quantify weather impacts and develop strategies to optimize for weather to improve business performance. Examples of Usage Quantify the impact of weather on sales across diverse locations and times of the year Understand how supply chains are impacted by weather Understand how employee’s attendance and performance are impacted by weather Understand how weather influences foot traffic at malls, stores and restaurants OnPoint Weather is available through Google Cloud Platform’s Commercial Dataset Program and can be easily integrated with other Google Cloud Platform Services to quickly reveal and quantify weather impacts on business. Weather Source provides a full range of support services from answering quick questions to consulting and building custom solutions. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery 瞭解詳情
The American Community Survey (ACS) is an ongoing survey that provides vital information on a yearly basis about our nation and its people by contacting over 3.5 million households across the country. The resulting data provides incredibly detailed demographic information across the US aggregated at various geographic levels which helps determine how more than $675 billion in federal and state funding are distributed each year. Businesses use ACS data to inform strategic decision-making. ACS data can be used as a component of market research, provide information about concentrations of potential employees with a specific education or occupation, and which communities could be good places to build offices or facilities. For example, someone scouting a new location for an assisted-living center might look for an area with a large proportion of seniors and a large proportion of people employed in nursing occupations. Through the ACS, we know more about jobs and occupations, educational attainment, veterans, whether people own or rent their homes, and other topics. Public officials, planners, and entrepreneurs use this information to assess the past and plan the future. For more information, see the Census Bureau's ACS Information Guide . This public dataset is hosted in Google BigQuery as part of the Google Cloud Public Datasets Program , with Carto providing cleaning and onboarding support. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this project is represented by photos, photos for the buildings of the University of Salford, these photos are taken by a mobile phone camera from different angels and different distances , even though this task sounds so easy but it encountered some challenges, these challenges are summarized below:
1. Obstacles.
a. Fixed or unremovable objects.
When taking several photos for a building or a landscape from different angels and directions ,there are some of these angels blocked by a form of a fixed object such as trees and plants, light poles, signs, statues, cabins, bicycle shades, scooter stands, generators/transformers, construction barriers, construction equipment and any other service equipment so it is unavoidable to represent some photos without these objects included, this will raise 3 questions.
- will these objects confuse the model/application we intend to create meaning will that obstacle prevent the model/application from identifying the designated building?
- Or will the photos be more precise with these objects and provide the capability for the model/application to identify these building with these obstacles included?
- How far is the maximum length for detection? In other words, how far will the mobile device with the application be from the building so it could or could not detect the designated building?
b. Removable and moving objects.
- Any University is crowded with staff and students especially in the rush hours of the day so it is hard for some photos to be taken without a personnel appearing in that photo in a certain time period of the day.
But, due to privacy issues and showing respect to that person, these photos are better excluded.
- Parked vehicles, trollies and service equipment can be an obstacle and might appear in these images as well as it can block access to some areas which an image from a certain angel cannot be obtained.
- Animals, like dogs, cats, birds or even squirrels cannot be avoided in some photos which are entitled to the same questions above.
2. Weather.
In a deep learning project, more data means more accuracy and less error, at this stage of our project it was agreed to have 50 photos per building but we can increase the number of photos for more accurate results but due to the limitation of time for this project it was agreed for 50 per building only.
these photos were taken on cloudy days and to expand our work on this project (as future works and recommendations).
Photos on sunny, rainy, foggy, snowy and any other weather condition days can be included.
Even photos in different times of the day can be included such as night, dawn, and sunset times. To provide our designated model with all the possibilities to identify these buildings in all available circumstances.
University House: 60 images Peel building is an important figure of the University of Salford due to its distinct and amazing exterior design but unfortunately it was excluded from the selection due to some maintenance activities at the time of collecting the photos for this project as it is partially covered with scaffolding and a lot of movement by personnel and equipment. If the supervisor suggests that this will be another challenge to include in the project then, it is mandatory to collect its photos. There are many other buildings in the University of Salford and again to expand our project in the future, we can include all the buildings of the University of Salford. The full list of buildings of the university can be reviewed by accessing an interactive map on: www.salford.ac.uk/find-us
Expand Further. This project can be improved furthermore with so many capabilities, again due to the limitation of time given to this project , these improvements can be implemented later as future works. In simple words, this project is to create an application that can display the building’s name when pointing a mobile device with a camera to that building. Future featured to be added: a. Address/ location: this will require collection of additional data which is the longitude and latitude of each building included or the post code which will be the same taking under consideration how close these buildings appear on the interactive map application such as Google maps, Google earth or iMaps. b. Description of the building: what is the building for, by which school is this building occupied? and what facilities are included in this building? c. Interior Images: all the photos at this stage were taken for the exterior of the buildings, will interior photos make an impact on the model/application for example, if the user is inside newton or chapman and opens the application, will the building be identified especially the interior of these buildings have a high level of similarity for the corridors, rooms, halls, and labs? Will the furniture and assets will be as obstacles or identification marks? d. Directions to a specific area/floor inside the building: if the interior images succeed with the model/application, it would be a good idea adding a search option to the model/application so it can guide the user to a specific area showing directions to that area, for example if the user is inside newton building and searches for lab 141 it will direct him to the first floor of the building with an interactive arrow that changes while the user is approaching his destination. Or, if the application can identify the building from its interior, a drop down list will be activated with each floor of this building, for example, if the model/application identifies Newton building, the drop down list will be activated and when pressing on that drop down list it will represent interactive tabs for each floor of the building, selecting one of the floors by clicking on its tab will display the facilities on that floor for example if the user presses on floor 1 tab, another screen will appear displaying which facilities are on that floor. Furthermore, if the model/application identifies another building, it should activate a different number of floors as buildings differ in the number of floors from each other. this feature can be improved with a voice assistant that can direct the user after he applies his search (something similar to the voice assistant in Google maps but applied to the interior of the university’s buildings. e. Top View: if a drone with a camera can be afforded, it can provide arial images and top views for the buildings that can be added to the model/application but these images can be similar to the interior images situation , the buildings can be similar to each other from the top with other obstacles included like water tanks and AC units.
Other Questions:
Will the model/application be reproducible? the presumed answer for this question should be YES, IF, the model/application will be fed with the proper data (images) such as images of restaurants, schools, supermarkets, hospitals, government facilities...etc.
This dataset contains records of syringes collected, including how many, whether they are collected from the ground or a kiosk, and when they are collected. The Syringe Summary dataset contains records for syringes collected by both Parks staff as well as collection efforts by the Washington Heights Corner Project (https://www.nycservice.org/organizations/index.php?org_id=1462) and New York Harm Reduction Educators (https://nyhre.org/) nonprofit group This is part of the NYC Parks Syringe Litter Data Collection: https://data.cityofnewyork.us/browse?Data-Collection_Data-Collection=NYC+Parks+Syringe+Litter&q= You can find the complete data dictionary at https://docs.google.com/spreadsheets/d/1VSUqd1peSc-4D2XnBZNiLdxa0Jg4z62D/edit?usp=sharing&ouid=100172030647057411202&rtpof=true&sd=true
Mountain Home Irrigated Lands 2004 was created for use in water budget studies in Mountain Home. The area of interest was determined by Hydrology Section staff at IDWR, and a study boundary was given to GIS staff and used to clip the model output. The random forest (RF) model is a type of supervised machine learning algorithm requiring GIS staff to provide manually labeled training data. GIS staff also provide the RF model with several input features, typically raster datasets that help distinguish characteristics of irrigated lands. Mountain Home Irrigated Lands 2004 used the following as input features: • Landsat 5 [2] and Landsat 7 [3] averaged surface reflectance imagery (bands: SWIR 2, NIR, Blue, and calculated NDVI)• 10-meter digital elevation model 4• Height Above Nearest Drainage (HAND) [5]• PRISM Climate Dataset 6• Topographic Wetness Index, derived from the digital elevation model [4]For additional information on the interpolation process for Landsat imagery, please see below. Additional datasets used only for labeling training data include IDWR-provided Active Water Rights Place of Use and National Agriculture Imagery Program (NAIP) aerial imagery for 2004 [7].The accuracy of Mountain Home Irrigated Lands 2004 dataset was verified by several methods. Firstly, a validation test is done by withholding a subset of the training data to evaluate how well the model classifies unseen information. Second, GIS staff will run several iterations of the model with variations of training data, with the goal of improving classification for areas consistently misclassified. This process requires GIS staff knowledge, aided by supplementary datasets, to review the area and make decisions. Once a model iteration is determined as ‘final’, a manual mask is created to correct any remaining misclassification in the dataset. Misclassification within the Mountain Home Irrigated Lands 2004 dataset was minimal, occurring primarily in the southern areas near the Snake River, as well as around reservoirs and stream channels. GIS staff manually reviewed potential misclassifications by examining Landsat 5 and Landsat 7 imagery, NAIP aerial imagery, and IDWR Active Irrigation Water Rights. References:[1] https://developers.google.com/earth-engine/apidocs/ee-classifier-smilerandomforest[2] https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC05_C02_T1_L2[3] https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LE07_C02_T1_L2[4] https://developers.google.com/earth-engine/datasets/catalog/USGS_3DEP_10m[5] Donchyts, G., Winsemius, H., Schellekens, J., Erickson, T., Gao, H., Savenije, H., & van de Giesen, N. (2016). Global 30m height above the nearest drainage (HAND). Geophysical Research Abstracts, 18, EGU2016-17445-3. EGU General Assembly 2016.[6] Daly, C., Halbleib, M., Smith, J.I., Gibson, W.P., Doggett, M.K., Taylor, G.H., Curtis, J. & Pasteris, P.A. (2008). Physiographically sensitive mapping of climatologicaltemperature and precipitation across the conterminous United States. International Journal of Climatology, 28, 2031-2064. doi:10.1002/joc.1688[7] U.S. Department of Agriculture, Farm Service Agency. (2004). National Agriculture Imagery Program (NAIP) imagery [Digital image]. U.S. Department of Agriculture. https://www.fsa.usda.gov/programs-and-services/aerial-photography/imagery-programs/naip-imagery/Information interpolated imagery:GIS staff prepared averaged Landsat images to reduce missing data from cloud cover. Images were averaged across four periods: March 1–May 1, May 1–July 1, July 1–September 1, and September 1–November 1. These same periods were also used to average PRISM climate data. The temporal extent of other input features was filtered to March 1–November 30, 2004, where applicable.
Author: Unknown
Source: PMLB - Supposedly originates from UCI, but can't find it there anymore.
Please cite:
XD6 Dataset Dataset used by Buntine and Niblett (1992). Composed of 10 features, one of which is irrelevant. The target is a disjunctive normal form formula over the nine other attributes, with additional classification noise.
ESPA Irrigated Lands 2019 was created for use in water budget studies in the ESPA study boundary. The area of interest was determined by Hydrology Section staff at IDWR, and a study boundary was given to GIS staff and used to clip the model output. The random forest (RF) model is a type of supervised machine learning algorithm requiring GIS staff to provide manually labeled training data. GIS staff also provide the RF model with several input features, typically raster datasets that help distinguish characteristics of irrigated lands. ESPA Irrigated Lands 2019 used the following as input features: • Landsat 8 [2] and Sentinel-2 [3] surface reflectance imagery (bands: SWIR 2, NIR, Blue, and calculated NDVI)• 10-meter digital elevation model 4• PRISM Climate Dataset 5• Height Above Nearest Drainage (HAND) [6]• IDWR METRIC [7] evapotranspiration dataset• Topographic Wetness Index, derived from the digital elevation modelFor additional information on processing Landsat and Sentinel-2 imagery, please see below. Additional datasets used only for labeling training data include Mapping EvapoTranspiration at high Resolution with Internalized Calibration (METRIC), IDWR-provided Active Water Rights Place of Use, the Cropland Data Layer [8] for 2019, and the National Agriculture Imagery Program (NAIP) imagery [9] for Idaho 2019.The accuracy of the ESPA Irrigated Lands 2019 dataset was verified by several methods. Firstly, a validation test was conducted by withholding a subset of the training data to evaluate how well the model classified unseen information. Second, GIS staff ran several iterations of the model with variations of training data, with the goal of improving classification for areas consistently misclassified. This process requires GIS staff knowledge, aided by supplementary datasets, to review the area and make decisions. Once a model iteration is determined as ‘final’, a manual mask was created to correct any remaining misclassification in the dataset. Manual corrections for the ESPA Irrigated Lands 2019 dataset were focused on the area between Ashton and Lamont, where false positive labels of “irrigated” occurred on dryland-managed fields. Some areas classified as irrigated near Bellevue were masked out due to suspected wetland. A general wetland mask for the entire ESPA study boundary was also applied. Other manual corrections were made throughout the study area, specifically for pivot-irrigated fields not matching the NAIP field boundaries. Decisions made during manual masking were conservative, relying heavily on both the presence of an active water right and clear indications of artificial application of water as observed in satellite imagery.References:[1] https://developers.google.com/earth-engine/apidocs/ee-classifier-smilerandomforest[2] https://developers.google.com/earth-engine/datasets/catalog/NASA_HLS_HLSL30_v002[3] https://developers.google.com/earth-engine/datasets/catalog/NASA_HLS_HLSS30_v002[4] https://developers.google.com/earth-engine/datasets/catalog/USGS_3DEP_10m[5] Daly, C., Halbleib, M., Smith, J.I., Gibson, W.P., Doggett, M.K., Taylor, G.H., Curtis, J. & Pasteris, P.A. (2008). Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. International Journal of Climatology, 28, 2031-2064. doi:10.1002/joc.1688[6] Donchyts, G., Winsemius, H., Schellekens, J., Erickson, T., Gao, H., Savenije, H., & van de Giesen, N. (2016). Global 30m height above the nearest drainage (HAND). Geophysical Research Abstracts, 18, EGU2016-17445-3. EGU General Assembly 2016.[7] https://data-idwr.hub.arcgis.com/documents/365d91be4da4407bbe3df11f242b34c7/about[8] https://developers.google.com/earth-engine/datasets/catalog/USDA_NASS_CDL[9] https://developers.google.com/earth-engine/datasets/catalog/USDA_NAIP_DOQQ
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about companies. It has 202 rows and is filtered where the company is Google. It features 30 columns including city, country, employees, and employee type.