Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
This Dataset comes from the R Package wbstats. The World Bank[https://www.worldbank.org/] is a tremendous source of global socio-economic data; spanning several decades and dozens of topics, it has the potential to shed light on numerous global issues. To help provide access to this rich source of information, The World Bank themselves, provide a well structured RESTful API. While this API is very useful for integration into web services and other high-level applications, it becomes quickly overwhelming for researchers who have neither the time nor the expertise to develop software to interface with the API. This leaves the researcher to rely on manual bulk downloads of spreadsheets of the data they are interested in. This too is can quickly become overwhelming, as the work is manual, time consuming, and not easily reproducible. The goal of the wbstats R-package is to provide a bridge between these alternatives and allow researchers to focus on their research questions and not the question of accessing the data. The wbstats R-package allows researchers to quickly search and download the data of their particular interest in a programmatic and reproducible fashion; this facilitates a seamless integration into their workflow and allows analysis to be quickly rerun on different areas of interest and with realtime access to the latest available data.
World Development Indicators (WDI) is the primary World Bank collection of development indicators, compiled from officially recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates. Copied from https://databank.worldbank.org/source/world-development-indicators.
Highlighted features of the wbstats R-package: * Uses version 2 of the World Bank API that provides access to more indicators and metadata than the previous API version * Access to all annual, quarterly, and monthly data available in the API * Support for searching and downloading data in multiple languages * Returns data in either wide (default) or long format * Support for Most Recent Value queries * Support for grep style searching for data descriptions and names * Ability to download data not only by country, but by aggregates as well, such as High Income or South Asia
More information can be found at https://www.rdocumentation.org/packages/wbstats/versions/1.0.4
Note for Version 1. Version 1 published January 2023. Its primary focus is on the featured indicator of climate change. Other versions planned will cover other featured indicators such as economy, education, energy, environment, debt, gender, health, infrastructure, poverty, science and technology.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
I am no longer updating this dataset. The purpose of this dataset was to track the changes in testing over time. Since then I believe there are better resources where you can get this information. Some open datasets which will give better documented: https://ourworldindata.org/grapher/full-list-total-tests-for-covid-19
For data related to testing in India, you can refer to the api endpoints provided by covid19india.org https://api.covid19india.org
I am trying to highlight the relationship between number of tests conducted vs. the number of confirmed cases. Is this metric important? we will find out - either via experience or through rigorous analysis.
Number of actual cases >> Number of confirmed cases
The dataset has been updated with a concatenated file Thanks to @Kamil Kiljan for suggesting the update filename: TestsConducted_AllDates_ddMMMYYYY
What's inside is more than just rows and columns. Please check the data definitions & change logs below
Update: March 31st 2020
The original location has not seen any new updates. Hence I have taken the information from a different source. Added source information
Wiki page on Covid-19 Testing
Check file: Tests_Conducted_31Mar2020.csv
Update: March 24th 2020 The data has been scraped from the following web-page Coronovirus Testing Data
The copyrights for the splash image belong to Jim Huylebroek for The New York TimesNYTimes Can't get tested? Maybe you are in the wrong country
The kernel used for extracting the information is provided as a kernel - Notebook for web-scraping & extracting information
Notebook illustrating insights that can be derived from the dataset - Test, Test and Test
This data can be used in conjunction with the following: 1. Health expenditure per capita and number of hospital beds per 1000s 2. Intervention measures employed by individual governments
Also please read Nate Silvers critique on how the number of positive cases doesn't mean anything unless we know how many tests were conducted & the testing strategy.
Date 09th June 2020 Updated.
Date 01st June 2020 Updated.
Date 23rd May 2020 Updated.
Date 11th May 2020
Concatenated all older datasets into a single file : TestsConducted_AllDates_ddbbbYYYY.csv
Notebook used for concatenating the datasets: Kernel Link
The April 15th file didn't have the 'Tests' column populated. Hence was calculated in the updated file. If you are not comfortable using it, please drop rows using the following code:
df = df.drop(df[df['FileDate']=='15April2020'].index)
Date: 8th May 2020 Updated.
Date: 5th May 2020
Updated. No change in data structure.
Replaced excel file with csv. This is for data before 31st March:Tests_Conducted_DEPRECEATED.csv
Date: 1st May 2020
Updated till date Minor changes in column names
Tests -> Tested
Tests /millionpeople -> Tested /millionpeople
New Column % added
Date: 26th April 2020
This was long delayed!
Date: April 15th 2020
Latest file: Tests_Conducted_15April2020.csv
Note that column names have changed in this file. This was because they were changed in the source file.
Positive / thousand(has changed to) Positive /millionpeople
New columns added:
Tests /millionpeople
and Date
TODO: normalize the column names & data with previous version.
Date: April 7th 2020
Latest file: Tests_Conducted_07April2020.csv
Date: April 5th 2020
Latest file: Tests_Conducted_05April2020.csv
Please note that older files are not being removed. This should give an indication of the change in the number of tests conducted over time.
Date: March 31st 2020
Latest file: Tests_Conducted_31Mar2020.csv
Facebook
TwitterSuccess.ai’s LinkedIn Data Solutions offer unparalleled access to a vast dataset of 700 million public LinkedIn profiles and 70 million LinkedIn company records, making it one of the most comprehensive and reliable LinkedIn datasets available on the market today. Our employee data and LinkedIn data are ideal for businesses looking to streamline recruitment efforts, build highly targeted lead lists, or develop personalized B2B marketing campaigns.
Whether you’re looking for recruiting data, conducting investment research, or seeking to enrich your CRM systems with accurate and up-to-date LinkedIn profile data, Success.ai provides everything you need with pinpoint precision. By tapping into LinkedIn company data, you’ll have access to over 40 critical data points per profile, including education, professional history, and skills.
Key Benefits of Success.ai’s LinkedIn Data: Our LinkedIn data solution offers more than just a dataset. With GDPR-compliant data, AI-enhanced accuracy, and a price match guarantee, Success.ai ensures you receive the highest-quality data at the best price in the market. Our datasets are delivered in Parquet format for easy integration into your systems, and with millions of profiles updated daily, you can trust that you’re always working with fresh, relevant data.
API Integration: Our datasets are easily accessible via API, allowing for seamless integration into your existing systems. This ensures that you can automate data retrieval and update processes, maintaining the flow of fresh, accurate information directly into your applications.
Global Reach and Industry Coverage: Our LinkedIn data covers professionals across all industries and sectors, providing you with detailed insights into businesses around the world. Our geographic coverage spans 259M profiles in the United States, 22M in the United Kingdom, 27M in India, and thousands of profiles in regions such as Europe, Latin America, and Asia Pacific. With LinkedIn company data, you can access profiles of top companies from the United States (6M+), United Kingdom (2M+), and beyond, helping you scale your outreach globally.
Why Choose Success.ai’s LinkedIn Data: Success.ai stands out for its tailored approach and white-glove service, making it easy for businesses to receive exactly the data they need without managing complex data platforms. Our dedicated Success Managers will curate and deliver your dataset based on your specific requirements, so you can focus on what matters most—reaching the right audience. Whether you’re sourcing employee data, LinkedIn profile data, or recruiting data, our service ensures a seamless experience with 99% data accuracy.
Key Use Cases:
Facebook
TwitterOur service uses our own unique network of locations accross the world in verifying a company's identity all via one easy API integration.
From a status check, all through to a Group Subsdiairy and Ultimate Business Owner Report, Worldbox can be your single source partner that covers the world.
Our data is structured in over 300 fields including financial data, enabling users to take advantage of a single mapping integration to access 100's of data points globally.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
More details about each file are in the individual file descriptions.
This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!
This dataset is maintained using the World Bank's APIs and Kaggle's API.
Cover photo by Alto Crew on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Facebook
TwitterSuccess.ai proudly offers our exclusive LinkedIn Data product, targeting C-level executives from around the globe. This premium dataset is meticulously curated to empower your business development, recruitment strategies, and market research efforts with direct access to top-tier professionals.
Global Reach and Detailed Insights: Our LinkedIn Data encompasses profiles of C-level executives worldwide, offering detailed insights that include professional histories, current and past affiliations, as well as direct contact information such as verified work emails and phone numbers. This data spans across industries such as finance, technology, healthcare, manufacturing, and more, ensuring you have comprehensive coverage no matter your sector focus.
Accuracy and Compliance: Accuracy is paramount in executive-level data. Each profile within our dataset undergoes rigorous verification processes, using advanced AI algorithms to ensure data accuracy and reliability. Our datasets are also compliant with global data privacy laws such as GDPR, CCPA, and others, providing you with data you can trust and use with confidence.
Empower Your Business Strategies: Leverage our LinkedIn Data to enhance various business functions:
Sales and Marketing: Directly reach decision-makers, reducing sales cycles and increasing conversion rates. Recruitment and Talent Acquisition: Identify and engage with potential candidates for executive roles within your organization. Market Research and Competitive Analysis: Gain insights into competitor leadership and strategic moves by analyzing executive backgrounds and professional networks. Robust Data Points Include:
Full Names and Titles: Gain access to the full names and current positions of C-level executives. Professional Emails and Phone Numbers: Direct communication channels to ensure your messages reach the intended audience. Company Information: Understand the organizational context with details about the company size, industry, and role within the corporation. Professional History: Detailed career trajectories, highlighting roles, responsibilities, and achievements. Education and Certifications: Educational backgrounds and certifications that enrich the professional profiles of these executives. Flexible Delivery and Integration: Our LinkedIn Data is available in multiple formats, including CSV, Excel, and via API, allowing easy integration into your CRM systems or other sales platforms. We provide continuous updates to our datasets, ensuring you always have access to the most current information available.
Competitive Pricing with Best Price Guarantee: Success.ai offers this valuable data at the most competitive rates in the industry, backed by our best price guarantee. We are committed to providing you with the highest quality data at prices that fit your budget, ensuring excellent return on investment.
Sample Data and Custom Solutions: To demonstrate the quality and depth of our LinkedIn Data, we offer a sample dataset for initial evaluation. For specific needs, our team is skilled at creating customized datasets tailored to your exact business requirements.
Client Success Stories: Our clients, from startups to Fortune 500 companies, have successfully leveraged our LinkedIn Data to drive growth and strategic initiatives. We provide case studies and testimonials that showcase the effectiveness of our data in real-world applications.
Engage with Success.ai Today: Connect with us to explore how our LinkedIn Data can transform your strategic initiatives. Our data experts are ready to assist you in leveraging the full potential of this dataset to meet your business goals.
Reach out to Success.ai to access the world of C-level executives and propel your business to new heights with strategic data insights that drive success.
Facebook
TwitterData are aggregated real-time from 38 of the world's largest sanction lists:
- EU: Common Foreign and Security Policy (CFSP) of the European Union (Sanctions EU)
- EU: Financial Sanctions Files (FSF)
- EU: EU Sanctions Map European Union
- UN: Consolidated United Nations Security Council Sanctions List (UN Sanctions List)
- UK: HR Treasury (HMT) Financial sanctions: Consolidated List of Targets (UK)
- UK: Current List of designated persons, terrorism and terrorist financing
- UK: UK Insolvency Disqualified Directors
- UK: UK OFSI Consolidated List of Targets
- USA: OFAC Consolidated (non-SDN) List
- USA: OFAC Specially Designated Nationals (SDN) List (U.S. Treasury)
- USA: OFAC Foreign Sanctions Evaders (FSE) List (U.S. Treasury)
- USA: Sectoral Sanctions Identifications (SSI) List
- USA: Palestinian Legislative Council (NS-PLC) list
- USA: US BIS Denied Persons List
- USA: US Trade Consolidated Screening List (CSL)
- USA: The List of Foreign Financial Institutions Subject to Part 561 (the Part 561 List)
- USA: Non-SDN Iranian Sanctions Act (NS-ISA) List
- USA: List of Persons Identified as Blocked Solely Pursuant to Executive Order 13599 (the 13599 List)
- AR: Argentine RePET
- AUS: The Sanctions Consolidated List
- BL: Consolidated List of the National Belgian List and of the List of European Sanctions
- BL: Belgian Financial Sanctions
- CAN: Canadian Listed Terrorist Entities
- CAN: Canadian Special Economic Measures Act Sanctions
- CAN: Consolidated Canadian Autonomous Sanctions List
- CH: Swiss SECO Sanctions/Embargoes
- FR: French Freezing of Assets
- IL: Israel Terrorists Organizations and Unauthorized Associations lists
- JP: Japan Economic sanctions and list of eligible people
- KG: Kyrgyz Nation List
- KZ: Kazakh Terror Financing list
- PL: Polish list of persons and entities subject to sanctions
- RUS: Rosfinmonitoring WMD-related entities
- SIN: Singapore Targeted Financial Sanctions
- UA: Ukraine National Security Sanctions
- UA: Ukraine SFMS Blacklist
- UA: Ukraine NABC Sanctions Tracker
- ZA: South African Targeted Financial Sanctions
Facebook
TwitterUnder the law of the sea, an exclusive economic zone (EEZ) is a sea zone over which a state has special rights over the exploration and use of marine resources. It stretches from the seaward edge of the state territorial sea out to 200 nautical miles from its coast. The data set has been derived from the World Maritime Boundaries v5.0 dataset from the Flanders Marine Institute (VLIZ) and integrated with the datasets "Communes 2010 – European Commission, Eurostat/GISCO", "Countries 2010, European Commission - Eurostat/GISCO", "Coastlines 2010, European Commission - Eurostat/GISCO". The data set (100K - 60M) is available to EEA due to EEA having a valid EBM v5.0 licence.
These metadata are derived from the original metadata records available at Inspire@EC.
Facebook
TwitterThis layer presents detectable thermal activity from MODIS satellites for the last 7 days. MODIS Global Fires is a product of NASA’s Earth Observing System Data and Information System (EOSDIS), part of NASA's Earth Science Data.
EOSDIS integrates remote sensing and GIS technologies to deliver global
MODIS hotspot/fire locations to natural resource managers and other
stakeholders around the World.
Consumption Best Practices:
Facebook
TwitterThe data gateway of the Food Security Portal contains over 12,000 datasets related to excessive price variability, COVID-19 food price monitoring, media analysis, high-frequency commodity prices, food security indicators, and others. Much of this data is available for 50 countries in the world and goes back over 50 years. We draw from the public, authoritative data sources like the World Bank, FAO, UNICEF, and others, as well as IFPRI's own data. In order to make the data contained on the site as useful as possible, it is available to freely download as a text file for human or as a JSON API for machines. Visitors to the site are welcome to download, aggregate, mash-up, and share this information as they like. For more information on the data license and how to use this data, please visit each dataset page. If you have any questions about the Data portal, our data collection techniques, or other related issues, please feel free to contact us (ifpri-fsp@cgiar.org) via email.
Facebook
TwitterThe Government is releasing public data to become more transparent and foster innovation. Some of this data was available before, but data.gov.uk brings it together in one searchable website. Making this data easily available means it will be easier for people to make decisions and suggestions about government policies based on detailed information. Hear more about the Government's Transparency agenda from the Prime Minister in this video. There are datasets available from all central government departments and a number of other public sector bodies and local authorities. Is data just public information? Not really. From data.gov.uk, you can access the raw data driving government forward. This can then be used by people to build useful applications that help society, or investigate how effective policy changes have been over time. General public information - such as how to find out if you are entitled to tax credits, or how to tax your car - can be found at gov.uk. You can use the data in all sorts of ways. This may be simply to analyse trends over time from one policy area, or to compare how different parts of government go about their work. Technical users will be able to create useful applications out of the raw data files, which can then be used by everyone. data.gov.uk provides a mini-site of guidance for publishers, including step-by-step process for including your data on data.gov.uk. Please see: Data.gov.uk is a key part of the Government's work on Transparency and Data. The data.gov.uk implementation is being led by the Data team in the Cabinet Office, working across government departments to ensure that data is released in a timely and accessible way. This work is being supported by Sir Tim-Berners Lee & Professor Nigel Shadbolt. There are a number of technical partners involved in the project to date. These include the CKAN, which runs the catalogue at data.gov.uk/data as well as a growing number of open data registries around the world. It is a project originally created by the Open Knowledge Foundation to make it easy to find, share and reuse open content and data. The CKAN software provides a web interface, programmer's API, feeds notifying of changes, and a browsable history of all changes. The API is documented here: http://data.gov.uk/datametadata-api-docs. There are a number of ways of getting involved in the project, dependent on your background or interest. For example: If you wish to get involved in working with data you can check this very brief primer, you can also check out organisations such as the Open Data Institute and the Open Knowledge Foundation. To find out technical details about the setup of data.gov.uk go here. Hear more about the Government's Transparency agenda from the Prime Minister.
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
The Climate Change Knowledge Portal (CCKP) is the World Bank's designated climate data service. CCKP offers a comprehensive suite of climate data and products that are derived from the latest generation of climate data archives. CCKP implements a systematic way of pre-processing the raw observed and model-based projection data to enable inter-comparable use across a broad range of applications. Data is available across an expansive range of climate variables and can be extracted per individual spatial units, variables, select timeframes, climate projection scenarios, across ensembles or individual models. Data is available as global gridded or spatially aggregated to national, subnational, watershed, and Exclusive Economic Zone scaled.
The Observed Climate Data, ERA5 0.25-degree dataset, ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to 2022. ERA5 is a satellite derived dataset, originally produced by the Copernicus Climate Change Service (C3S) at ECMWF at on original grid of 0.50-degree. ERA5 products derived by CCKP are downscaled and available at 0.25-degree, from 1950-2022.
Global gridded NetCDF files can be accessed via https://registry.opendata.aws/wbg-cckp/
Pre-computed statistics for spatially aggregated data is available as API or xls via
Facebook
Twitterdescription: Florida DOT (FDOT) installed Vehicle Awareness Devices (VADs) on a set of Lynx transit buses as part of a demonstration for the ITS World Congress held in Orlando in October 2011. These VADs recorded vehicle data during the World Congress and continue to operate after the World Congress. Periodically the VADs are removed from the vehicles and the data files are retrieved. FHWA Has confirmed that the data do not contain identification of individual transit operators or any other forms for Personally Identifiable Information (PII). This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov.; abstract: Florida DOT (FDOT) installed Vehicle Awareness Devices (VADs) on a set of Lynx transit buses as part of a demonstration for the ITS World Congress held in Orlando in October 2011. These VADs recorded vehicle data during the World Congress and continue to operate after the World Congress. Periodically the VADs are removed from the vehicles and the data files are retrieved. FHWA Has confirmed that the data do not contain identification of individual transit operators or any other forms for Personally Identifiable Information (PII). This legacy dataset was created before data.transportation.gov and is only currently available via the attached file(s). Please contact the dataset owner if there is a need for users to work with this data using the data.transportation.gov analysis features (online viewing, API, graphing, etc.) and the USDOT will consider modifying the dataset to fully integrate in data.transportation.gov.
Facebook
TwitterTwelve Data is a technology-driven company that provides financial market data, financial tools, and dedicated solutions. Large audiences - from individuals to financial institutions - use our products to stay ahead of the competition and success.
At Twelve Data we feel responsible for where the markets are going and how people are able to explore them. Coming from different technological backgrounds, we see how the world is lacking the unique and simple place where financial data can be accessed by anyone, at any time. This is what distinguishes us from others, we do not only supply the financial data but instead, we want you to benefit from it, by using the convenient format, tools, and special solutions.
We believe that the human factor is still a very important aspect of our work and therefore our ethics guides us on how to treat people, with convenient and understandable resources. This includes world-class documentation, human support, and dedicated solutions.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Open Energy Information (OpenEI) is a knowledge-sharing online community dedicated to connecting people with the latest information and data on energy resources from around the world. Created in partnership with the United States Department of Energy and federal laboratories across the nation, OpenEI offers access to real-time data and unique visualizations that will help you find the answers you need to make better, more informed decisions with structured linked open data and information in widely-used formats such as API, CSV, XML, and XLS. OpenEI is making a profound impact on the world’s energy transformation by providing data access, generative data use, key knowledge derivation tools, and synthetic datasets that will help inform policy, purchase, build, and business decisions. This community-based platform is a core competency for the U.S. Department of Energy and its laboratories, providing a high-degree of value for building knowledge and datasets, connecting and structuring data via linked open data standards, and serving as the place for the world to contribute and utilize energy data, APIs and web-services.
OpenEI is the backbone to the DOE Data Catalog and federates all DOE-sponsored data upwards to Data.gov in order to enable data transparency and access.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Wide Web is a complex interconnected digital ecosystem, where information and attention flow between platforms and communities throughout the globe. These interactions co-construct how we understand the world, reflecting and shaping public discourse. Unfortunately, researchers often struggle to understand how information circulates and evolves across the web because platform-specific data is often siloed and restricted by linguistic barriers. To address this gap, we present a comprehensive, multilingual dataset capturing all Wikipedia links shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW subreddits. Each linked Wikipedia article is enriched with revision history, page view data, article ID, redirects, and Wikidata identifiers. Through a research agreement with Reddit, our dataset ensures user privacy while providing a query and ID mechanism that integrates with the Reddit and Wikipedia APIs. This enables extended analyses for researchers studying how information flows across platforms. For example, Reddit discussions use Wikipedia for deliberation and fact-checking which subsequently influences Wikipedia content, by driving traffic to articles or inspiring edits. By analyzing the relationship between information shared and discussed on these platforms, our dataset provides a foundation for examining the interplay between social media discourse and collaborative knowledge consumption and production.
The motivations for this dataset stem from the challenges researchers face in studying the flow of information across the web. While the World Wide Web enables global communication and collaboration, data silos, linguistic barriers, and platform-specific restrictions hinder our ability to understand how information circulates, evolves, and impacts public discourse. Wikipedia and Reddit, as major hubs of knowledge sharing and discussion, offer an invaluable lens into these processes. However, without comprehensive data capturing their interactions, researchers are unable to fully examine how platforms co-construct knowledge. This dataset bridges this gap, providing the tools needed to study the interconnectedness of social media and collaborative knowledge systems.
WikiReddit, a comprehensive dataset capturing all Wikipedia mentions (including links) shared in posts and comments on Reddit from 2020 to 2023, excluding those from private and NSFW (not safe for work) subreddits. The SQL database comprises 336K total posts, 10.2M comments, 1.95M unique links, and 1.26M unique articles spanning 59 languages on Reddit and 276 Wikipedia language subdomains. Each linked Wikipedia article is enriched with its revision history and page view data within a ±10-day window of its posting, as well as article ID, redirects, and Wikidata identifiers. Supplementary anonymous metadata from Reddit posts and comments further contextualizes the links, offering a robust resource for analysing cross-platform information flows, collective attention dynamics, and the role of Wikipedia in online discourse.
Data was collected from the Reddit4Researchers and Wikipedia APIs. No personally identifiable information is published in the dataset. Data from Reddit to Wikipedia is linked via the hyperlink and article titles appearing in Reddit posts.
Extensive processing with tools such as regex was applied to the Reddit post/comment text to extract the Wikipedia URLs. Redirects for Wikipedia URLs and article titles were found through the API and mapped to the collected data. Reddit IDs are hashed with SHA-256 for post/comment/user/subreddit anonymity.
We foresee several applications of this dataset and preview four here. First, Reddit linking data can be used to understand how attention is driven from one platform to another. Second, Reddit linking data can shed light on how Wikipedia's archive of knowledge is used in the larger social web. Third, our dataset could provide insights into how external attention is topically distributed across Wikipedia. Our dataset can help extend that analysis into the disparities in what types of external communities Wikipedia is used in, and how it is used. Fourth, relatedly, a topic analysis of our dataset could reveal how Wikipedia usage on Reddit contributes to societal benefits and harms. Our dataset could help examine if homogeneity within the Reddit and Wikipedia audiences shapes topic patterns and assess whether these relationships mitigate or amplify problematic engagement online.
The dataset is publicly shared with a Creative Commons Attribution 4.0 International license. The article describing this dataset should be cited: https://doi.org/10.48550/arXiv.2502.04942
Patrick Gildersleve will maintain this dataset, and add further years of content as and when available.
posts| Column Name | Type | Description |
|---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
crosspost_parent_id | TEXT | The ID of the original Reddit post if this post is a crosspost. |
post_id | TEXT | Unique identifier for the Reddit post. |
created_at | TIMESTAMP | The timestamp when the post was created. |
updated_at | TIMESTAMP | The timestamp when the post was last updated. |
language_code | TEXT | The language code of the post. |
score | INTEGER | The score (upvotes minus downvotes) of the post. |
upvote_ratio | REAL | The ratio of upvotes to total votes. |
gildings | INTEGER | Number of awards (gildings) received by the post. |
num_comments | INTEGER | Number of comments on the post. |
comments| Column Name | Type | Description |
|---|---|---|
subreddit_id | TEXT | The unique identifier for the subreddit. |
post_id | TEXT | The ID of the Reddit post the comment belongs to. |
parent_id | TEXT | The ID of the parent comment (if a reply). |
comment_id | TEXT | Unique identifier for the comment. |
created_at | TIMESTAMP | The timestamp when the comment was created. |
last_modified_at | TIMESTAMP | The timestamp when the comment was last modified. |
score | INTEGER | The score (upvotes minus downvotes) of the comment. |
upvote_ratio | REAL | The ratio of upvotes to total votes for the comment. |
gilded | INTEGER | Number of awards (gildings) received by the comment. |
postlinks| Column Name | Type | Description |
|---|---|---|
post_id | TEXT | Unique identifier for the Reddit post. |
end_processed_valid | INTEGER | Whether the extracted URL from the post resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the Reddit post. |
final_valid | INTEGER | Whether the final URL from the post resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final URL. |
final_url | TEXT | The final URL after redirections. |
redirected | INTEGER | Indicator of whether the posted URL was redirected (1) or not (0). |
in_title | INTEGER | Indicator of whether the link appears in the post title (1) or post body (0). |
commentlinks| Column Name | Type | Description |
|---|---|---|
comment_id | TEXT | Unique identifier for the Reddit comment. |
end_processed_valid | INTEGER | Whether the extracted URL from the comment resolves to a valid URL. |
end_processed_url | TEXT | The extracted URL from the comment. |
final_valid | INTEGER | Whether the final URL from the comment resolves to a valid URL after redirections. |
final_status | INTEGER | HTTP status code of the final |
Facebook
TwitterThe Light Of The World Travel And Tours Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterMeet Earth EngineGoogle Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONSLEARN MOREGLOBAL-SCALE INSIGHTExplore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.EXPLORE TIMELAPSEREADY-TO-USE DATASETSThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.EXPLORE DATASETSSIMPLE, YET POWERFUL APIThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.EXPLORE THE APIGoogle Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.CONVENIENT TOOLSUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.LEARN ABOUT THE CODE EDITORSCIENTIFIC AND HUMANITARIAN IMPACTScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.SEE CASE STUDIESREADY TO BE PART OF THE SOLUTION?SIGN UP NOWTERMS OF SERVICE PRIVACY ABOUT GOOGLE
Facebook
TwitterThe World In The Crowd Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterAnnual means of the geomagnetic field vector components from observatories around the world, from 1840 to the present day. At present there are about 160 observatories. These data are useful for tracking changes in the magnetic field generated inside the Earth. Data are produced by a number of organisations around the world, including BGS. Data are available in plain text from www.geomag.bgs.ac.uk. This data is connected to other geomagnetic data sets, but can be used without reference to them.
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
This Dataset comes from the R Package wbstats. The World Bank[https://www.worldbank.org/] is a tremendous source of global socio-economic data; spanning several decades and dozens of topics, it has the potential to shed light on numerous global issues. To help provide access to this rich source of information, The World Bank themselves, provide a well structured RESTful API. While this API is very useful for integration into web services and other high-level applications, it becomes quickly overwhelming for researchers who have neither the time nor the expertise to develop software to interface with the API. This leaves the researcher to rely on manual bulk downloads of spreadsheets of the data they are interested in. This too is can quickly become overwhelming, as the work is manual, time consuming, and not easily reproducible. The goal of the wbstats R-package is to provide a bridge between these alternatives and allow researchers to focus on their research questions and not the question of accessing the data. The wbstats R-package allows researchers to quickly search and download the data of their particular interest in a programmatic and reproducible fashion; this facilitates a seamless integration into their workflow and allows analysis to be quickly rerun on different areas of interest and with realtime access to the latest available data.
World Development Indicators (WDI) is the primary World Bank collection of development indicators, compiled from officially recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates. Copied from https://databank.worldbank.org/source/world-development-indicators.
Highlighted features of the wbstats R-package: * Uses version 2 of the World Bank API that provides access to more indicators and metadata than the previous API version * Access to all annual, quarterly, and monthly data available in the API * Support for searching and downloading data in multiple languages * Returns data in either wide (default) or long format * Support for Most Recent Value queries * Support for grep style searching for data descriptions and names * Ability to download data not only by country, but by aggregates as well, such as High Income or South Asia
More information can be found at https://www.rdocumentation.org/packages/wbstats/versions/1.0.4
Note for Version 1. Version 1 published January 2023. Its primary focus is on the featured indicator of climate change. Other versions planned will cover other featured indicators such as economy, education, energy, environment, debt, gender, health, infrastructure, poverty, science and technology.