This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1st Dec 2024. This version of the dataset has been superseeded and is now restricted. Please refer to the most recent release.
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. To address this pressing issue, we present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year Worth of Social Data". PlosOne (2024) a https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky dataset have the right to opt out and request the removal of their data, in accordance with GDPR provisions (Article 17). It should be noted, however, that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides derogations (Article 17(3)(d) and Article 89).
We emphasize that, in compliance with GDPR (Article 4(5)), the released data has been thoroughly pseudonymized. Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to further protect individual privacy.
If you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with subject "Removal request: [username]").
We will process your request within a reasonable timeframe.
This work is supported by :
The data asset is relational. There are four different data files. One represents customer information. A second contains address information. A third contains demographic data, and a fourth includes customer cancellation information. All of the data sets have linking ids, either ADDRESS_ID or CUSTOMER_ID. The ADDRESS_ID is specific to a postal service address. The CUSTOMER_ID is unique to a particular individual. Note that there can be multiple customers assigned to the same address. Also, note that not all customers have a match in the demographic table. The latitude-longitude information generally refers to the Dallas-Fort Worth Metroplex in North Texas and is mappable at a high level. Just be aware that if you drill down too far, some people may live in the middle of Jerry World, DFW Airport, or Lake Grapevine. Any lat/long pointing to a specific residence, business, or physical site is coincidental. The physical addresses are fake and are unrelated to the lat/long.
In the termination table, you can derive a binary (churn/did not churn) from the ACCT_SUSPD_DATE field. The data set is modelable. That is, you can use the other data in the data to predict who did and did not churn. The underlying logic behind the prediction should be consistent with predicting auto insurance churn in the real world.
Terms and Conditions Unless otherwise stated, the data on this site is free. It can be duplicated and used as you wish, but we'd appreciate and if you source it as coming from us.
SUMMARY:
Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.
If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.
VERTICALS:
PRICING/QUOTE:
Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:
Send us over this info and we can answer any questions you have, provide sample, and more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains detailed information about a wide range of books available for purchase on an online retailer's website. It includes data such as book titles, authors, categories, prices, stock status, number of copies left, book length in pages, edition details, publication information, and customer engagement metrics like wished users counts and discount offers. This dataset is ideal for data analysis projects focusing on book sales trends, customer preferences, and market insights within the online retail book industry. Whether you're exploring pricing strategies, customer behavior, or genre popularity, this dataset provides a rich resource for data-driven exploration and analysis in the domain of online book retailing. Content:
Book Title: Title of the book.
Author: Author(s) of the book.
Category: Category or genre of the book.
Price (TK): Price of the book in TK (local currency).
Stock Status: Availability status of the book (In Stock/Out of Stock).
Copies Left: Number of copies currently available.
Book Length (Pages): Number of pages in the book.
Edition: Edition details of the book.
Publication: Publisher or publication details.
Wished Users: Number of users who have added this book to their wish list.
Discount Offer: Any available discount or promotional offer on the book.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17551034%2F8d16bae9e2eb4046322d2423daaaad97%2F949302f2-fb09-4fbf-95d6-338388c4a753.png?generation=1719996201879469&alt=media" alt="">
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
by Vizonix
This dataset differentiates between 4 similar object classes: 4 types of canned goods. We built this dataset with cans of olives, beans, stewed tomatoes, and refried beans.
The dataset is pre-augmented. That is to say, all required augmentations are applied to the actual native dataset prior to inference. We have found that augmenting this way provides our users maximum visibility and flexibility in tuning their dataset (and classifier) to achieve their specific use-case goals. Augmentations are present and visible in the native dataset prior to the classifier - so it's never a mystery what augmentation tweaks produce a more positive or negative outcome during training. It also eliminates the risk of downsizing affecting annotations.
The training images in this dataset were created in our studio in Florida from actual physical objects to the following specifications:
The training images in this dataset were composited / augmented in this way:
1,600 (+) different images were uploaded for each class (out of the 25,000 total images created for each class).
Understanding our Dataset Insights File
As users train their classifiers, they often wish to enhance accuracy by experimenting with or tweaking their dataset. With our Dataset Insights documents, they can easily determine which images possess which augmentations. Dataset Insights allow users to easily add or remove images with specific augmentations as they wish. This also provides a detailed profile and inventory of each file in the dataset.
The Dataset Insights document enables the user to see exactly which source image, angle, augmentation(s), etc. were used to create each image in the dataset.
Dataset Insight Files:
About Vizonix
Vizonix (vizonix.com) creates from-scratch datasets created from 100% in-house generated photography. Our images and backgrounds are generated in-house in our Florida studio. We typically image smaller items, deliver in 72 hours, and specialize in Manufacturer Quality Assurance (MQA) datasets.
SPEN’s Digital View of Distribution Connections provides greater transparency of major connections pipelines (1MW and above) at SPM and SPD Grid Supply Points (GSP) for existing and new customers looking to obtain a generation connection. Please see disclaimers for appropriate use and liability associated with this.Disclaimer: Digital View of Distribution Connections has been developed to provide customers with a snapshot of some of the key known connection constraints and the connections pipeline at each SPEN GSP. However, please note this is not ordered by queue position. No express or implied condition, warranty, term or representation is given by SPEN regarding the quality, accuracy or completeness of the information contained within or produced as part of the reports or any related information. SPEN shall have no liability to any user for any loss or damage of any kind incurred as a result of the use of Digital View of Distribution Connections, or reliance by the user on any information provided within it. If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above. Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Single Digital View dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information page. Download dataset metadata (JSON)
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of 2024. Expected changes:Metadata is missing or incomplete for some layers at this time and will be continuously improved.We expect to update this layer roughly in line with CDTFA at some point, but will increase the update cadence over time as we are able to automate the final pieces of the process.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty and incorporated place (city) boundaries along with third party identifiers used to join in external data. Boundaries are from the authoritative source the California Department of Tax and Fee Administration (CDTFA), altered to show the counties as one polygon. This layer displays the city polygons on top of the County polygons so the area isn"t interrupted. The GEOID attribute information is added from the US Census. GEOID is based on merged State and County FIPS codes for the Counties. Abbreviations for Counties and Cities were added from Caltrans Division of Local Assistance (DLA) data. Place Type was populated with information extracted from the Census. Names and IDs from the US Board on Geographic Names (BGN), the authoritative source of place names as published in the Geographic Name Information System (GNIS), are attached as well. Finally, coastal buffers are removed, leaving the land-based portions of jurisdictions. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal Buffers (this dataset)Place AbbreviationsUnincorporated Areas (Coming Soon)Census Designated Places (Coming Soon)Cartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except COASTAL, Area_SqMi, Shape_Area, and Shape_Length to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCOPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering systemPlace Name: CDTFA incorporated (city) or county nameCounty: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.Legal Place Name: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.GEOID: numeric geographic identifiers from the US Census Bureau Place Type: Board on Geographic Names authorized nomenclature for boundary type published in the Geographic Name Information SystemPlace Abbr: CalTrans Division of Local Assistance abbreviations of incorporated area namesCNTY Abbr: CalTrans Division of Local Assistance abbreviations of county namesArea_SqMi: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.COASTAL: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 12/17/2024 by Nick Santos using code path at https://github.com/CDT-ODS-DevSecOps/cdt-ods-gis-city-county/ at commit 0bf269d24464c14c9cf4f7dea876aa562984db63. It incorporates updates from CDTFA as of 12/12/2024. Future updates will include improvements to metadata and update frequency.
Abstract copyright UK Data Service and data collection copyright owner. The Patents, Designs and Trade Marks, 2010: Secure Access dataset includes details on applications to the Intellectual Property Office (IPO) for patents, designs and trade marks by businesses or individuals. The patent file holds the main information for all patents in Great Britain attained through the Department of Business, Innovation and Skills' Optics extract in June/July 2010. The file includes patent applications filed between 1978 and 2009. There should be no multiple observations due to the uniqueness and single occurrence of an application number within these datasets. A patent can have more than one IPC classification, depending on how many purposes it fulfils. The trade mark analysable dataset was created using the trade mark licensee data, which relates to trademarks applied for with the IPO. The data were extracted in July 2010 and include applications filed between 1876 and 2010. The licensee data is that provided to the IPO's external customers and holds the same information as contained within the IPO website. This dataset represents one of four relating to trade marks; this covers all trade marks applied for within the UK - there are three other datasets relating to trade marks applied for via the Madrid UK/Madrid EP agreements and Office of Harmonization for the Internal Market. The UK design data represent all designs applied for with the IPO between 1974 and 2010. The data were extracted in May 2010. The patent, design and trade mark data provided are readily available from online sources. The data are provided for Secure Access so that users can link the data to Secure Access business surveys using Inter-Departmental Business Register (IDBR) reference numbers, which are anonymous but unique reference numbers assigned to business organisations. Other Secure Access business surveys with which users may wish to combine the IPO data include the Annual Respondents Database (SN 6644), the UK Innovation Survey (SN 6699), and the Business Expenditure on Research and Development survey (SN 6690). In preparing the patent, design and trade mark data for release, certain variables that can lead to the identification of businesses or individuals on the IPO website have been anonymised. These variables include applicant numbers, application numbers and design numbers. The patent data include postcodes for a proportion of the applicants. The trade mark data include the country of the proprietor. The design data include no spatial units.For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.
The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7534952 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok
To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.
Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.
We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.
To build this dataset from the IDs here:
Go to https://github.com/networkdynamics/pytok and clone the repo locally
Run pip install -e . in the pytok directory
Run pip install pandas tqdm to install these libraries if not already installed
Run get_videos.py to get the video data
Run video_comments.py to get the comment data
Run user_tiktoks.py to get the video history of the users
Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms
Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv
If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.
Please do not hesitate to make an issue in this repo to get our help with this!
The videos.csv will contain the following columns:
video_id: Unique video ID
createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
desc: The full video description from the author
hashtags: A list of hashtags used in the video description
share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty
share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty
share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty
share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.
mentions: A list of users mentioned in the video description, if any
The comments.csv will contain the following columns:
comment_id: Unique comment ID
createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format
author_name: Unique author name
author_id: Unique author ID
text: Text of the comment
mentions: A list of users that are tagged in the comment
video_id: The ID of the video the comment is on
comment_language: The language of the comment, as predicted by the TikTok API
reply_comment_id: If the comment is replying to another comment, this is the ID of that comment
The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
The Patents, Designs and Trade Marks, 2010: Secure Access dataset includes details on applications to the Intellectual Property Office (IPO) for patents, designs and trade marks by businesses or individuals.
The patent file holds the main information for all patents in Great Britain attained through the Department of Business, Innovation and Skills' Optics extract in June/July 2010. The file includes patent applications filed between 1978 and 2009. There should be no multiple observations due to the uniqueness and single occurrence of an application number within these datasets. A patent can have more than one IPC classification, depending on how many purposes it fulfils.
The trade mark analysable dataset was created using the trade mark licensee data, which relates to trademarks applied for with the IPO. The data were extracted in July 2010 and include applications filed between 1876 and 2010. The licensee data is that provided to the IPO's external customers and holds the same information as contained within the IPO website. This dataset represents one of four relating to trade marks; this covers all trade marks applied for within the UK - there are three other datasets relating to trade marks applied for via the Madrid UK/Madrid EP agreements and Office of Harmonization for the Internal Market.
The UK design data represent all designs applied for with the IPO between 1974 and 2010. The data were extracted in May 2010.
The patent, design and trade mark data provided are readily available from online sources. The data are provided for Secure Access so that users can link the data to Secure Access business surveys using Inter-Departmental Business Register (IDBR) reference numbers, which are anonymous but unique reference numbers assigned to business organisations. Other Secure Access business surveys with which users may wish to combine the IPO data include the Annual Respondents Database (SN 6644), the UK Innovation Survey (SN 6699), and the Business Expenditure on Research and Development survey (SN 6690).
In preparing the patent, design and trade mark data for release, certain variables that can lead to the identification of businesses or individuals on the IPO website have been anonymised. These variables include applicant numbers, application numbers and design numbers.
The patent data include postcodes for a proportion of the applicants. The trade mark data include the country of the proprietor. The design data include no spatial units.
For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
TfL statement: We've committed to making our open data freely available to third parties and to engaging developers to deliver new products, apps and services for our customers. Over 11,000 developers have registered for our open data, consisting of our unified API (Application Programming Interface) that powers over 600 travel apps in the UK with over 46% of Londoners using apps powered by our data. This enables millions of journeys in London each day, giving customers the right information at the right time through their channel of choice. Why are we committing to open data? Public data - As a public body, our data is publically owned Reach - Our goal is to ensure any person needing travel information about London can get it wherever and whenever they wish, in any way they wish Economic benefit - Open data facilitates the development of technology enterprises, small and medium businesses, generating employment and wealth for London and beyond Innovation - By having thousands of developers working on designing and building applications, services and tools with our data and APIs, we are effectively crowdsourcing innovation How is our open data presented? Data is presented in three main ways: Static data files - Data files which rarely change Feeds - Data files refreshed at regular intervals API (Application Programming Interface) - Enabling a query from an application to receive a bespoke response, depending on the parameters supplied. Find out more about our unified API. Data is presented as XML wherever possible.
This dataset consists of all starter packs and all following network data available on Bluesky in January and February 2025. Starter packs can be created by any Bluesky user. They are lists of users and curated feeds with a minimum of 6 and a maximum of 150 users, curated by the starter pack creator. The creator typically names them and provides a description. Other users can use a single click to follow all users in the starter pack, or they can scroll through a specific starter pack to decide who to follow within that starter pack. In our dataset, all DIDs (persistent, unique identifiers) are anonymized with a non-reversible hash function; users in the network, as well as users who created starter packs, or appear in starter packs, are identified by their hashed DIDs. Similarly, starter packs themselves are identified by their hashed identifiers.
First, we include the Bluesky following network as it appeared in late January/early February 2025. This shows all available directed following relationships on Bluesky. We also include a network dataset of starter packs with information on creators and starter pack members. This is intended for users who wish to undertake a computational analysis of the networks created by starter packs or starter packs’ influences on networks.
The "LV Historic Faults" dataset provides all unplanned occurrences / interruptions of 3 minutes or longer on the SPEN network from 1st April 2014. Any occurrence on the SPEN network which: a) affects the distribution system or other connected electricity supply system, which involves a physical break in the circuit upstream of the customers interrupted (or circuit affected), for three minutes or longer, due to automatic or manual operation of switchgear or fusegear, or due to any other open circuit condition, which: results in an interruption of supply to customer(s) for three minutes or longer; orprevents a circuit or item of equipment from carrying normal load current or being able to withstand “through fault-current” for three minutes or longer.b) causes the un-programmed isolation of any circuit or item of equipment, energised at power system voltage, which has not been classified as a pre-arranged incident;c) causes failures of non-system equipment (e.g. pilot cables, oil and gas alarms, voltage control equipment etc.) which result in the disconnection of equipment energised at power system voltage; incorrect operations of protection equipment which result in the interruption of a circuit energised at power system voltage;d) causes failures by protection equipment to operate. This includes incidents where the main protection fails to operate and a fault clearance is initiated by back-up protection or protection at another point on the network;e) causes any interruption to supply to customers caused by incidents on other connected systems owned by the National Grid/Transmission Companies (in Scotland), other distribution businesses, embedded generators, that arises from loss of supply to these systems.Disclaimer: Data for previous reporting years has been submitted and will not change. Data within the current reporting year is subject to change.If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Historic Faults dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.Download dataset metadata (JSON)
This dataset includes a suite of post-seismic 2m resolution DEMs post-dating the 2013 Mw7.7 Baluchistan earthquake. The DEMs were constructed using the open-source software package SETSM (https://mjremotesensing.wordpress.com/setsm/) from DigitalGlobe base imagery (©DigitalGlobe 2018). DEMs were mosaicked and vertically registered using the Ames StereoPipeline (https://ti.arc.nasa.gov/tech/asr/groups/intelligent-robotics/ngt/stereo/). The base imagery included 0.5m and 0.3m resolution panchromatic imagery from QuickBird, GEOEYE, WorldView1, WorldView2, and WorldView3 (©DigitalGlobe 2018). The dataset includes DEMs generated from in-track stereo imagery, as well as DEMs constructed from mixed pairs of non-in-track stereo images. The post-event DEMs are not vertically registered to a pre-existing DEM in order to avoid removal of relative co-seismic offsets between the pre- and post-event pairs. The generation of this dataset was funded by NASA in cooperation with the U.S. Geological Survey. A complete description of the generation of this dataset and the images that were used to construct the DEMs can be found in the associated manuscript: Barnhart WD, Gold RD, Shea HN, Peterson KE, Briggs RW, Harbor DJ (2019) Vertical coseismic offsets derived from high-resolution stereogrammetric DSM differencing: The 2013 Baluchistan, Pakistan earthquake, JGR-Solid Earth. DOI:10.1029/2018JB017107 The naming convention of individual DEMs is detailed in the metadata. Note: The source data for this project are the individual 2 meter DEMs that were constructed with the SETSM open-source software (described above). However, in order to utilize the OpenTopography webmap interface, these DEMs were mosaiced into a single seamless mosaic of post-earthquake topography. Details on how this single mosaic was created are in the metadata. Users are cautioned that files created using the webmap interface will use the averaged, mosaic data. For certain applications, users may wish to utilize the source datasets by downloading the original DEMs via the "Source" directory under the "Bulk Download" section of the OpenTopography website.
The "Network Flow: Power, Current and Embedded Generation" dataset details historically measured average and reactive power flows, current and provides indicative demand/generation output for each Grid and Primary network group for our SP Manweb (SPM) and SP Distribution (SPD) licence areas, for each half-hourly period.Disclaimer:This data has been triaged to remove information pertaining to individual customers or where the dataset contains sensitive information. This dataset is updated on a weekly basis.Whilst all reasonable care has been taken in the preparation of this data, SP Energy Networks is not responsible for any loss that may be attributed to the use of this information.Download dataset metadata (JSON) If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Network Flow: Power, Current and Embedded Generation dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.
The "Voltage" dataset details historically measured average voltage for each Grid and Primary circuit for each half-hourly period. For additional information on column definitions, please click on the Dataset schema link below. Disclaimer: This data has been triaged to remove information pertaining to individual customers or where the dataset contains sensitive information. This dataset is updated on a weekly basis. Whilst all reasonable care has been taken in the preparation of this data, SP Energy Networks is not responsible for any loss that may be attributed to the use of this information.Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Voltage dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.
The "Operational Forecasting" dataset provides a forecast view of demand and generation for each Grid and Primary network group for our SP Distribution (SPD) and SP Manweb (SPM) licence areas, for each half-hourly period.Disclaimer:This data has been triaged to remove information pertaining to individual customers or where the dataset contains sensitive information. This dataset is updated on a daily basis.Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Operational Forecasting dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.
The "Curtailment: Generator Type" dataset details the total measured curtailment aggregated by technology type.At this time, only curtailment events measured and recorded by our Active Network Management (ANM) system are captured.For additional information on column definitions, please click on the Dataset schema link below.Disclaimer: It should be noted that this dataset does not align with the Curtailment Limit guidance as required for a Curtailable Connection (as per DCUSA Schedule 2D) as it includes both curtailment under fault conditions and curtailment driven by Transmission network constraints. This dataset is updated on a quarterly basis.Some values are left blank in the dataset given that from February 2021 a number of customers connected under the Dunbar ANM system moved to an unconstrained connection and therefore no data is published beyond that point.Download dataset metadata (JSON)If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Curtailment dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.