6 datasets found

Geographic distribution of Wikimedia traffic
figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Os Keyes (2023). Geographic distribution of Wikimedia traffic [Dataset]. http://doi.org/10.6084/m9.figshare.1317408.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1317408.v2
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Os Keyes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.

This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied was the Foundation's new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser. Geolocation was then performed using MaxMind's geolocation products. There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with
o
Indeed Data Science & ML Job Postings
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Indeed Data Science & ML Job Postings [Dataset]. https://www.opendatabay.com/data/ai-ml/cc486027-ff62-4396-a1d5-b98c3aa7a223
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
This dataset offers insights into job postings, primarily focusing on roles in Data Engineering, Data Analysis, Data Science, and Machine Learning Engineering. It contains approximately 1583 records of job information, providing a snapshot of the employment landscape in these fields. The dataset is ideal for understanding market demands and trends.

Columns

job_title: The specific title of the job post.

company: The name of the hiring company.

job_location: The city and state where the job is located.

job_summary: A detailed description outlining the purpose of the hiring.

post_date: The date when the job was posted on Indeed.

today: The date when the data was collected.

job_salary: The expected salary range for the position.

job_url: A direct link to the job posting for further details.

Distribution

The dataset is provided as a single CSV file, named 'job_dataset.csv'. It comprises 1583 rows and 8 columns, representing the structure of the collected job information. The data collection occurred around 26th July 2022.

Usage

This dataset is well-suited for various analytical tasks: * Cleaning and refining job data. * Identifying the most in-demand skills within the data and machine learning sectors. * Analysing the geographical distribution of jobs. * Conducting Natural Language Processing (NLP) and research on job descriptions. * Market analysis for job seekers, recruiters, and educational institutions.

Coverage

The dataset has a global scope, with notable concentrations of job postings in locations such as Bengaluru, Karnataka (30%) and Gurgaon, Haryana (7%). The records primarily cover job postings for data-related roles, including Data Engineer, Data Analyst, Data Scientist, and ML Engineer, with data collected around July 2022. Some postings were listed over 30 days prior to the collection date.

License

CC0

Who Can Use It

This dataset is valuable for: * Data Scientists and Analysts: For market research, trend analysis, and skill demand assessment. * Machine Learning Engineers: To understand job requirements and role distributions. * Researchers: For academic studies on labour markets and skill development. * Job Seekers: To identify popular roles, required skills, and geographical opportunities. * Companies and Recruiters: For talent acquisition strategies and competitor analysis.

Dataset Name Suggestions

Indeed Data Science & ML Job Postings

Global Data Roles Dataset

Job Market Insights: Data Careers

Data Analytics & AI Job Data

UK Data Professional Vacancies

Attributes

Original Data Source: Indeed job (Data science /data analyst/ ML)
Mawson Escarpment Geology GIS Dataset
researchdata.edu.au
data.aad.gov.au
Updated Nov 4, 2002
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
THOST, DOUG E; BAIN, JOHN (2002). Mawson Escarpment Geology GIS Dataset [Dataset]. http://doi.org/10.26179/5c7deb18226f9
Explore at:
Unique identifier
https://doi.org/10.26179/5c7deb18226f9
Dataset updated
Nov 4, 2002
Dataset provided by
Australian Antarctic Divisionhttps://www.antarctica.gov.au/
Australian Antarctic Data Centre
Authors
THOST, DOUG E; BAIN, JOHN
Time period covered
Apr 10, 1998 - Jun 30, 1998
Area covered

Description
There are several ArcInfo coverages described by this metadata record - FRAME, GEOL, MAPGRID, SITES, STRLINE and STRUC (in that order). Each coverage is described below. The data is also provided as shapefiles and ArcInfo interchange files. The data was used for the Mawson Escarpment Geology map published in 1998. This map is available from a URL provided in this metadata record.

FRAME:

The coverage FRAME contains (arcs) and (polygon, label) and forms the limits of the data sets or map coverage of the MAWSON ESCARPMENT area of the AUSTRALIAN ANTARCTIC TERRITORY.

The purpose or intentions for this dataset is to form a cookie cutter for future data which may be aquired and require clipping to the map/data area.

GEOL:

The coverage GEOL is historical geological data covering the MAWSON ESCARPMENT area.

The data were captured in ARC/INFO format and combined with geological outcrops that were accurately digitised over a March 1989 Landsat Thematic Mapper image at a scale of 1:100000. It is not recomended that this data be used beyond this scale.

The coverage contains Arcs (lines) and polygons (polygon labels). These object are attributed as fully as possible in their .aat file for arcs and .pat for polygon labels and conform with the Geoscience Australia Geoscience Data Dictionary Version 98.04

The purpose or intentions for the dataset is that it become part of a greater geological database of the Australian Antarctic Territory.

(1998-04-10 - 1998-06-30)

MAPGRID:

MAPGRID is a graticule that was generated as a 5 minute by 5 minute grid mainly to allow for good location/registration of source materials for digitising and adding some locational anno.mapgrat

This covers other function was to be used for a proof plot.

(1998-04-22 - 1998-06-30)

SITES:

The purpose or intentions for this dataset is to provide the approximate location of this historic data on sample sites in the MAWSON ESCARPMENT region of the AUSTRALIAN ANTARCTIC TERRITORY, for future expansion or more accurate positioning when improved records of location are found.

(1998-05-11 - 1998-06-30)

STRLINE:

This Structural lines for geology coverage is named (STRLINE).

The purpose or intentions for the dataset is to have the linear structural features in their own coverage containing only structure which does not form polygon boundaries.

(1998-05-28 - 1998-06-30)

STRUC:

This coverage called STRUC for structural measurements is a point coverage. It can be described as Mesoscopic structures at a site or outcrop.

The purpose or intentions for the dataset is to provide all the known structural point data information in the one coverage.

(1998-05-28 - 1998-06-30)

California Rivers Assessment Interactive Database

cmr.earthdata.nasa.gov

Updated Aug 29, 2017

Facebook

Twitter

Click to copy link

Link copied

Cite

(2017). California Rivers Assessment Interactive Database [Dataset]. https://cmr.earthdata.nasa.gov/search/concepts/C1214614946-SCIOPS

Explore at:

Dataset updated

Aug 29, 2017

Time period covered

Jan 1, 2001 - Present

Area covered

Description

The California Rivers Assessment (CARA) is a computer-based data management system designed to give resource managers, policy-makers, landowners, scientists and interested citizens rapid access to essential information and tools with which to make sound decisions about the conservation and use of California's rivers.

 The California Rivers Assessment has the following goals: To provide a
 computerized forum for collecting, storing, analyzing, exchanging and
 retrieving river-related resource data; Improve coordination between local,
 state and federal agencies, other organizations and the interested public; To
 develop a perspective on the demands and uses of California's river resources;
 and establish a process for evaluating and assessing river resources on an
 ongoing basis.

 Although a substantial amount of information about California's rivers is now
 stored in computers, the locations and formats for this information vary, often
 making it difficult to access and use. The second phase of the California
 Rivers Assessment is design of a data management system called an Aggregated
 Information Model (AIM) that makes a wide range of river-related information
 available at a single location in a consistent format. As in Phase I, the Reach
 File system and Hydrologic Unit Codes provide a common, statewide geographic
 reference framework for integrating data from different sources. 

 The development of the AIM began with the acquisition and integration of
 computer-based river resource information on 13 of California's 149 river
 basins. These "demonstration basins" were chosen to reflect California's wide
 range of biological diversity. The Aggregated Information Model now
 incorporates 60 or more data sets for each of 120 river basins. These layers
 include vegetation, land ownership, dams, water quality parameters, rare and
 endangered species, native fish, National Wetlands Inventory designations,
 soils and farmlands inventories. By June 1998, all of California's 149 basins
 will have a uniform set of aggregated data, as well as other specific local
 data sets. 

 AIM allows users to produce custom maps from GIS layers by providing a query
 system over the World Wide Web. "ICE MAPS" (Interactive California
 Environmental Mapping, Assessment and Planning System) enables users to create
 and download their own maps by defining a region within the state and selecting
 desired data sets. Map products include a title bar, scale bar, legend, links
 to related Internet sites and tabular data where available. A new version of
 "ICE MAPS" is also available, that allows users to actually query the AIM data.

d
Data from: Spatially-Disaggregated Crop Production Statistics Data in Africa...
search.dataone.org
dataverse.harvard.edu
+3more
Updated Sep 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
International Food Policy Research Institute (IFPRI) (2024). Spatially-Disaggregated Crop Production Statistics Data in Africa South of the Sahara for 2017 [Dataset]. http://doi.org/10.7910/DVN/FSSKBW
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/FSSKBW
Dataset updated
Sep 25, 2024
Dataset provided by
Harvard Dataverse
Authors
International Food Policy Research Institute (IFPRI)
Time period covered
Jan 1, 2016 - Dec 31, 2018
Description
Using a variety of inputs, IFPRI's Spatial Production Allocation Model (SPAM, also known as MapSPAM) uses a cross-entropy approach to make plausible estimates of crop distribution within disaggregated units. Moving the data from coarser units such as countries and sub-national provinces, to finer units such as grid cells, reveals spatial patterns of crop performance, creating Africa South of the Sahara-wide grid-scape at the confluence between geography and agricultural production systems. Improving spatial understanding of crop production systems allows policymakers and donors to better target agricultural and rural development policies and investments, increasing food security and growth with minimal environmental impacts.
d
A spatially enriched synthetic population developed by spatial...
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abubakar, Eleojo O. (2023). A spatially enriched synthetic population developed by spatial microsimulation of 2016/2017 Multiple Indicator Cluster Survey microdata of Kogi State, Nigeria [Dataset]. http://doi.org/10.7910/DVN/LOJTUJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/LOJTUJ
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Abubakar, Eleojo O.
Description
This dataset is a spatially enriched synthetic individual-level population of people aged between 15 and 49 years in Kogi State, Nigeria. This was developed through the process of Spatial Microsimulation (SMS). This involves a synergy of Multiple Indicator Cluster Survey (MICS 5) microdata and an analytical small-area zoning system which is an optimized surrogate of the ward-level geography of Kogi State, Nigeria. Whereas the actual MICS 5 microdata of Kogi State is a population sample of about 1,305 people comprising 912 females and 393 males, this synthetic population contains 2,249,170 microunit comprising 1,115,283 females and 1,133,887 males, with about 425 MICS 5 attributes. At the same time, the small-area zones to which each microunit/person belong is also indicated with primary keys named 'ZoneID' or ‘GRID_ID_15k'. The analytical zoning system for this synthetic population is included in this dataset as a separate shapefile. This synthetic population data is useful for Small-Area Estimation and mapping of relevant attributes. It enables robust spatial analysis of MICS 5 microdata and indicators at very fine spatial scales, as well as at individual-level in Kogi State, Nigeria. Owing to its enhanced spatial fidelity, this dataset is invaluable for supporting precise and equitable geographical targeting of Sustainable Development Goal (SDG) initiatives in developing countries like Nigeria.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Os Keyes (2023). Geographic distribution of Wikimedia traffic [Dataset]. http://doi.org/10.6084/m9.figshare.1317408.v2

Geographic distribution of Wikimedia traffic

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.1317408.v2

Dataset updated

Jun 1, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Os Keyes

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the proportion of traffic to each public Wikimedia project, from each known country, with some caveats.

This dataset represents an aggregate of 1:1000 sampled pageviews from the entirety of 2014. The pageviews definition applied was the Foundation's new pageviews definition; additionally, spiders and similar automata were filtered out with Tobie's ua-parser. Geolocation was then performed using MaxMind's geolocation products. There are no privacy implications that we could identify; The data comes from 1:1000 sampled logs, is proportionate rather than raw, and aggregates any nations with

Clear search

Close search

Google apps

Main menu

Geographic distribution of Wikimedia traffic

Indeed Data Science & ML Job Postings

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Mawson Escarpment Geology GIS Dataset

California Rivers Assessment Interactive Database

Data from: Spatially-Disaggregated Crop Production Statistics Data in Africa...

A spatially enriched synthetic population developed by spatial...

Geographic distribution of Wikimedia traffic