This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of March 2025. The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCity boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal Buffers (this dataset)Without Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCDTFA_CITY: CDTFA incorporated city nameCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_CITY_ABBR: Abbreviations of incorporated area names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 4 characters. Not present in the county-specific layers.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 02/24/2025 by Nick Santos using code path at
Residence time distribution (RTD) is a critically important characteristic of groundwater flow systems; however, it cannot be measured directly. RTD can be inferred from tracer data with analytical models (few parameters) or with numerical models (many parameters). The second approach permits more variation in system properties but is used less frequently than the first because large-scale numerical models can be resource intensive. With the data and computer codes in this data release users can (1) reconstruct and run 115 General Simulation Models (GSMs) of groundwater flow, (2) calculate groundwater age metrics at selected GSM cells, (3) train a boosted regression tree model using the provided data, (4) predict three-dimensional continuous groundwater age metrics across the Glacial Principal Aquifer, and (5) predict tritium concentrations at wells for comparison with measured tritium concentrations. The computer codes in this data release are in the form of Python scripts and Jupyter Notebooks. Users will need to have these Python resources installed on their computers to run the codes. Instructions for creating the Python environment can be found in the file Creating the Python environment.txt. Users who would rather not run the scripts but who wish to obtain the final data sets can do so by downloading the file Output--Predictions.7z. Users who wish to reproduce the data sets in this release can do so by downloading, unzipping, and running the data workflow in Starn_GW_Residence_Time_Data_and_Scripts.7z. The codes in this file use relative pathnames, so the directory structure within this file should not be changed. The ".7z" file extension indicates 7-Zip files, http://www.7-zip.org Executables--MODFLOW and MODPATH executable files provided for convenience. These are Windows 64-bit versions. Step 1--Create General Simulation Models--Codes to create 115 GSMs Step 2--Data preparation--Calculate residence time distributions at selected GSM cells Step 3--Metamodel training--Train a boosted regression tree metamodel (XGBoost) Step 4--Metamodel prediction--Predict age metrics throughout the Glacial Aquifer Step 5--Tritium simulation --Calculate tritium concentration at selected wells
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset comprises cloud masks for 513 1022-by-1022 pixel subscenes, at 20m resolution, sampled random from the 2018 Level-1C Sentinel-2 archive. The design of this dataset follows from some observations about cloud masking: (i) performance over an entire product is highly correlated, thus subscenes provide more value per-pixel than full scenes, (ii) current cloud masking datasets often focus on specific regions, or hand-select the products used, which introduces a bias into the dataset that is not representative of the real-world data, (iii) cloud mask performance appears to be highly correlated to surface type and cloud structure, so testing should include analysis of failure modes in relation to these variables.
The data was annotated semi-automatically, using the IRIS toolkit, which allows users to dynamically train a Random Forest (implemented using LightGBM), speeding up annotations by iteratively improving it's predictions, but preserving the annotator's ability to make final manual changes when needed. This hybrid approach allowed us to process many more masks than would have been possible manually, which we felt was vital in creating a large enough dataset to approximate the statistics of the whole Sentinel-2 archive.
In addition to the pixel-wise, 3 class (CLEAR, CLOUD, CLOUD_SHADOW) segmentation masks, we also provide users with binary
classification "tags" for each subscene that can be used in testing to determine performance in specific circumstances. These include:
Wherever practical, cloud shadows were also annotated, however this was sometimes not possible due to high-relief terrain, or large ambiguities. In total, 424 were marked with shadows (if present), and 89 have shadows that were not annotatable due to very ambiguous shadow boundaries, or terrain that cast significant shadows. If users wish to train an algorithm specifically for cloud shadow masks, we advise them to remove those 89 images for which shadow was not possible, however, bear in mind that this will systematically reduce the difficulty of the shadow class compared to real-world use, as these contain the most difficult shadow examples.
In addition to the 20m sampled subscenes and masks, we also provide users with shapefiles that define the boundary of the mask on the original Sentinel-2 scene. If users wish to retrieve the L1C bands at their original resolutions, they can use these to do so.
Please see the README for further details on the dataset structure and more.
Contributions & Acknowledgements
The data were collected, annotated, checked, formatted and published by Alistair Francis and John Mrziglod.
Support and advice was provided by Prof. Jan-Peter Muller and Dr. Panagiotis Sidiropoulos, for which we are grateful.
We would like to extend our thanks to Dr. Pierre-Philippe Mathieu and the rest of the team at ESA PhiLab, who provided the environment in which this project was conceived, and continued to give technical support throughout.
Finally, we thank the ESA Network of Resources for sponsoring this project by providing ICT resources.
https://bsky.social/about/support/toshttps://bsky.social/about/support/tos
Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue.
The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.
Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.
Here is a description of the dataset files.
If used for research purposes, please cite the following paper describing the dataset details:
Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year's Worth of Social Data." PlosOne (2024) https://doi.org/10.1371/journal.pone.0310330
Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.
Users included in the Bluesky Social dataset have the right to opt-out and request the removal of their data, per GDPR provisions (Article 17).
We emphasize that the released data has been thoroughly pseudonymized in compliance with GDPR (Article 4(5)). Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to protect individual privacy further and minimize reidentification risk. Moreover, it should be noted that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides opt-out derogations (Article 17(3)(d) and Article 89).
Nonetheless, if you wish to have your activities excluded from this dataset, please submit your request to blueskydatasetmoderation@gmail.com (with the subject "Removal request: [username]"). We will process your request within a reasonable timeframe - updates will occur monthly, if necessary, and access to previous versions will be restricted.
This work is supported by :
This dataset includes a suite of post-seismic 2m resolution DEMs post-dating the 2013 Mw7.7 Baluchistan earthquake. The DEMs were constructed using the open-source software package SETSM (https://mjremotesensing.wordpress.com/setsm/) from DigitalGlobe base imagery (©DigitalGlobe 2018). DEMs were mosaicked and vertically registered using the Ames StereoPipeline (https://ti.arc.nasa.gov/tech/asr/groups/intelligent-robotics/ngt/stereo/). The base imagery included 0.5m and 0.3m resolution panchromatic imagery from QuickBird, GEOEYE, WorldView1, WorldView2, and WorldView3 (©DigitalGlobe 2018). The dataset includes DEMs generated from in-track stereo imagery, as well as DEMs constructed from mixed pairs of non-in-track stereo images. The post-event DEMs are not vertically registered to a pre-existing DEM in order to avoid removal of relative co-seismic offsets between the pre- and post-event pairs. The generation of this dataset was funded by NASA in cooperation with the U.S. Geological Survey. A complete description of the generation of this dataset and the images that were used to construct the DEMs can be found in the associated manuscript: Barnhart WD, Gold RD, Shea HN, Peterson KE, Briggs RW, Harbor DJ (2019) Vertical coseismic offsets derived from high-resolution stereogrammetric DSM differencing: The 2013 Baluchistan, Pakistan earthquake, JGR-Solid Earth. DOI:10.1029/2018JB017107 The naming convention of individual DEMs is detailed in the metadata. Note: The source data for this project are the individual 2 meter DEMs that were constructed with the SETSM open-source software (described above). However, in order to utilize the OpenTopography webmap interface, these DEMs were mosaiced into a single seamless mosaic of post-earthquake topography. Details on how this single mosaic was created are in the metadata. Users are cautioned that files created using the webmap interface will use the averaged, mosaic data. For certain applications, users may wish to utilize the source datasets by downloading the original DEMs via the "Source" directory under the "Bulk Download" section of the OpenTopography website.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.