6 datasets found
  1. Geographic distribution of Wikimedia traffic

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Os Keyes (2023). Geographic distribution of Wikimedia traffic [Dataset]. http://doi.org/10.6084/m9.figshare.1317408.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Os Keyes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.

    This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied was the Foundation's new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser. Geolocation was then performed using MaxMind's geolocation products. There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with

  2. o

    Indeed Data Science & ML Job Postings

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Indeed Data Science & ML Job Postings [Dataset]. https://www.opendatabay.com/data/ai-ml/cc486027-ff62-4396-a1d5-b98c3aa7a223
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    This dataset offers insights into job postings, primarily focusing on roles in Data Engineering, Data Analysis, Data Science, and Machine Learning Engineering. It contains approximately 1583 records of job information, providing a snapshot of the employment landscape in these fields. The dataset is ideal for understanding market demands and trends.

    Columns

    • job_title: The specific title of the job post.
    • company: The name of the hiring company.
    • job_location: The city and state where the job is located.
    • job_summary: A detailed description outlining the purpose of the hiring.
    • post_date: The date when the job was posted on Indeed.
    • today: The date when the data was collected.
    • job_salary: The expected salary range for the position.
    • job_url: A direct link to the job posting for further details.

    Distribution

    The dataset is provided as a single CSV file, named 'job_dataset.csv'. It comprises 1583 rows and 8 columns, representing the structure of the collected job information. The data collection occurred around 26th July 2022.

    Usage

    This dataset is well-suited for various analytical tasks: * Cleaning and refining job data. * Identifying the most in-demand skills within the data and machine learning sectors. * Analysing the geographical distribution of jobs. * Conducting Natural Language Processing (NLP) and research on job descriptions. * Market analysis for job seekers, recruiters, and educational institutions.

    Coverage

    The dataset has a global scope, with notable concentrations of job postings in locations such as Bengaluru, Karnataka (30%) and Gurgaon, Haryana (7%). The records primarily cover job postings for data-related roles, including Data Engineer, Data Analyst, Data Scientist, and ML Engineer, with data collected around July 2022. Some postings were listed over 30 days prior to the collection date.

    License

    CC0

    Who Can Use It

    This dataset is valuable for: * Data Scientists and Analysts: For market research, trend analysis, and skill demand assessment. * Machine Learning Engineers: To understand job requirements and role distributions. * Researchers: For academic studies on labour markets and skill development. * Job Seekers: To identify popular roles, required skills, and geographical opportunities. * Companies and Recruiters: For talent acquisition strategies and competitor analysis.

    Dataset Name Suggestions

    • Indeed Data Science & ML Job Postings
    • Global Data Roles Dataset
    • Job Market Insights: Data Careers
    • Data Analytics & AI Job Data
    • UK Data Professional Vacancies

    Attributes

    Original Data Source: Indeed job (Data science /data analyst/ ML)

  3. Mawson Escarpment Geology GIS Dataset

    • researchdata.edu.au
    • data.aad.gov.au
    Updated Nov 4, 2002
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    THOST, DOUG E; BAIN, JOHN (2002). Mawson Escarpment Geology GIS Dataset [Dataset]. http://doi.org/10.26179/5c7deb18226f9
    Explore at:
    Dataset updated
    Nov 4, 2002
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Antarctic Data Centre
    Authors
    THOST, DOUG E; BAIN, JOHN
    Time period covered
    Apr 10, 1998 - Jun 30, 1998
    Area covered
    Description

    There are several ArcInfo coverages described by this metadata record - FRAME, GEOL, MAPGRID, SITES, STRLINE and STRUC (in that order). Each coverage is described below. The data is also provided as shapefiles and ArcInfo interchange files. The data was used for the Mawson Escarpment Geology map published in 1998. This map is available from a URL provided in this metadata record.

    FRAME:

    The coverage FRAME contains (arcs) and (polygon, label) and forms the limits of the data sets or map coverage of the MAWSON ESCARPMENT area of the AUSTRALIAN ANTARCTIC TERRITORY.

    The purpose or intentions for this dataset is to form a cookie cutter for future data which may be aquired and require clipping to the map/data area.

    GEOL:

    The coverage GEOL is historical geological data covering the MAWSON ESCARPMENT area.

    The data were captured in ARC/INFO format and combined with geological outcrops that were accurately digitised over a March 1989 Landsat Thematic Mapper image at a scale of 1:100000. It is not recomended that this data be used beyond this scale.

    The coverage contains Arcs (lines) and polygons (polygon labels). These object are attributed as fully as possible in their .aat file for arcs and .pat for polygon labels and conform with the Geoscience Australia Geoscience Data Dictionary Version 98.04

    The purpose or intentions for the dataset is that it become part of a greater geological database of the Australian Antarctic Territory.

    (1998-04-10 - 1998-06-30)

    MAPGRID:

    MAPGRID is a graticule that was generated as a 5 minute by 5 minute grid mainly to allow for good location/registration of source materials for digitising and adding some locational anno.mapgrat

    This covers other function was to be used for a proof plot.

    (1998-04-22 - 1998-06-30)

    SITES:

    The purpose or intentions for this dataset is to provide the approximate location of this historic data on sample sites in the MAWSON ESCARPMENT region of the AUSTRALIAN ANTARCTIC TERRITORY, for future expansion or more accurate positioning when improved records of location are found.

    (1998-05-11 - 1998-06-30)

    STRLINE:

    This Structural lines for geology coverage is named (STRLINE).

    The purpose or intentions for the dataset is to have the linear structural features in their own coverage containing only structure which does not form polygon boundaries.

    (1998-05-28 - 1998-06-30)

    STRUC:

    This coverage called STRUC for structural measurements is a point coverage. It can be described as Mesoscopic structures at a site or outcrop.

    The purpose or intentions for the dataset is to provide all the known structural point data information in the one coverage.

    (1998-05-28 - 1998-06-30)

  4. n

    California Rivers Assessment Interactive Database

    • cmr.earthdata.nasa.gov
    Updated Aug 29, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). California Rivers Assessment Interactive Database [Dataset]. https://cmr.earthdata.nasa.gov/search/concepts/C1214614946-SCIOPS
    Explore at:
    Dataset updated
    Aug 29, 2017
    Time period covered
    Jan 1, 2001 - Present
    Area covered
    Description

    The California Rivers Assessment (CARA) is a computer-based data management system designed to give resource managers, policy-makers, landowners, scientists and interested citizens rapid access to essential information and tools with which to make sound decisions about the conservation and use of California's rivers.

     The California Rivers Assessment has the following goals: To provide a
     computerized forum for collecting, storing, analyzing, exchanging and
     retrieving river-related resource data; Improve coordination between local,
     state and federal agencies, other organizations and the interested public; To
     develop a perspective on the demands and uses of California's river resources;
     and establish a process for evaluating and assessing river resources on an
     ongoing basis.
    
     Although a substantial amount of information about California's rivers is now
     stored in computers, the locations and formats for this information vary, often
     making it difficult to access and use. The second phase of the California
     Rivers Assessment is design of a data management system called an Aggregated
     Information Model (AIM) that makes a wide range of river-related information
     available at a single location in a consistent format. As in Phase I, the Reach
     File system and Hydrologic Unit Codes provide a common, statewide geographic
     reference framework for integrating data from different sources. 
    
     The development of the AIM began with the acquisition and integration of
     computer-based river resource information on 13 of California's 149 river
     basins. These "demonstration basins" were chosen to reflect California's wide
     range of biological diversity. The Aggregated Information Model now
     incorporates 60 or more data sets for each of 120 river basins. These layers
     include vegetation, land ownership, dams, water quality parameters, rare and
     endangered species, native fish, National Wetlands Inventory designations,
     soils and farmlands inventories. By June 1998, all of California's 149 basins
     will have a uniform set of aggregated data, as well as other specific local
     data sets. 
    
     AIM allows users to produce custom maps from GIS layers by providing a query
     system over the World Wide Web. "ICE MAPS" (Interactive California
     Environmental Mapping, Assessment and Planning System) enables users to create
     and download their own maps by defining a region within the state and selecting
     desired data sets. Map products include a title bar, scale bar, legend, links
     to related Internet sites and tabular data where available. A new version of
     "ICE MAPS" is also available, that allows users to actually query the AIM data.
    
  5. d

    Data from: Spatially-Disaggregated Crop Production Statistics Data in Africa...

    • search.dataone.org
    • dataverse.harvard.edu
    • +3more
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    International Food Policy Research Institute (IFPRI) (2024). Spatially-Disaggregated Crop Production Statistics Data in Africa South of the Sahara for 2017 [Dataset]. http://doi.org/10.7910/DVN/FSSKBW
    Explore at:
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    International Food Policy Research Institute (IFPRI)
    Time period covered
    Jan 1, 2016 - Dec 31, 2018
    Description

    Using a variety of inputs, IFPRI's Spatial Production Allocation Model (SPAM, also known as MapSPAM) uses a cross-entropy approach to make plausible estimates of crop distribution within disaggregated units. Moving the data from coarser units such as countries and sub-national provinces, to finer units such as grid cells, reveals spatial patterns of crop performance, creating Africa South of the Sahara-wide grid-scape at the confluence between geography and agricultural production systems. Improving spatial understanding of crop production systems allows policymakers and donors to better target agricultural and rural development policies and investments, increasing food security and growth with minimal environmental impacts.

  6. d

    A spatially enriched synthetic population developed by spatial...

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abubakar, Eleojo O. (2023). A spatially enriched synthetic population developed by spatial microsimulation of 2016/2017 Multiple Indicator Cluster Survey microdata of Kogi State, Nigeria [Dataset]. http://doi.org/10.7910/DVN/LOJTUJ
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Abubakar, Eleojo O.
    Description

    This dataset is a spatially enriched synthetic individual-level population of people aged between 15 and 49 years in Kogi State, Nigeria. This was developed through the process of Spatial Microsimulation (SMS). This involves a synergy of Multiple Indicator Cluster Survey (MICS 5) microdata and an analytical small-area zoning system which is an optimized surrogate of the ward-level geography of Kogi State, Nigeria. Whereas the actual MICS 5 microdata of Kogi State is a population sample of about 1,305 people comprising 912 females and 393 males, this synthetic population contains 2,249,170 microunit comprising 1,115,283 females and 1,133,887 males, with about 425 MICS 5 attributes. At the same time, the small-area zones to which each microunit/person belong is also indicated with primary keys named 'ZoneID' or ‘GRID_ID_15k'. The analytical zoning system for this synthetic population is included in this dataset as a separate shapefile. This synthetic population data is useful for Small-Area Estimation and mapping of relevant attributes. It enables robust spatial analysis of MICS 5 microdata and indicators at very fine spatial scales, as well as at individual-level in Kogi State, Nigeria. Owing to its enhanced spatial fidelity, this dataset is invaluable for supporting precise and equitable geographical targeting of Sustainable Development Goal (SDG) initiatives in developing countries like Nigeria.

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Os Keyes (2023). Geographic distribution of Wikimedia traffic [Dataset]. http://doi.org/10.6084/m9.figshare.1317408.v2
Organization logo

Geographic distribution of Wikimedia traffic

Explore at:
txtAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Os Keyes
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.

This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied was the Foundation's new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser. Geolocation was then performed using MaxMind's geolocation products. There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with

Search
Clear search
Close search
Google apps
Main menu