35 datasets found

d
California Overlapping Cities and Counties and Identifiers with Coastal...
catalog.data.gov
data.ca.gov
+2more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2025). California Overlapping Cities and Counties and Identifiers with Coastal Buffers [Dataset]. https://catalog.data.gov/dataset/california-overlapping-cities-and-counties-and-identifiers-with-coastal-buffers
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Technology
Description
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of 2024. Expected changes:Metadata is missing or incomplete for some layers at this time and will be continuously improved.We expect to update this layer roughly in line with CDTFA at some point, but will increase the update cadence over time as we are able to automate the final pieces of the process.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty and incorporated place (city) boundaries along with third party identifiers used to join in external data. Boundaries are from the authoritative source the California Department of Tax and Fee Administration (CDTFA), altered to show the counties as one polygon. This layer displays the city polygons on top of the County polygons so the area isn"t interrupted. The GEOID attribute information is added from the US Census. GEOID is based on merged State and County FIPS codes for the Counties. Abbreviations for Counties and Cities were added from Caltrans Division of Local Assistance (DLA) data. Place Type was populated with information extracted from the Census. Names and IDs from the US Board on Geographic Names (BGN), the authoritative source of place names as published in the Geographic Name Information System (GNIS), are attached as well. Finally, the coastline is used to separate coastal buffers from the land-based portions of jurisdictions. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal Buffers (this dataset)Without Coastal BuffersPlace AbbreviationsUnincorporated Areas (Coming Soon)Census Designated Places (Coming Soon)Cartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except COASTAL, Area_SqMi, Shape_Area, and Shape_Length to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCOPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering systemPlace Name: CDTFA incorporated (city) or county nameCounty: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.Legal Place Name: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.GEOID: numeric geographic identifiers from the US Census Bureau Place Type: Board on Geographic Names authorized nomenclature for boundary type published in the Geographic Name Information SystemPlace Abbr: CalTrans Division of Local Assistance abbreviations of incorporated area namesCNTY Abbr: CalTrans Division of Local Assistance abbreviations of county namesArea_SqMi: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.COASTAL: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 12/17/2024 by Nick Santos using code path at https://github.com/CDT-ODS-DevSecOps/cdt-ods-gis-city-county/ at commit 0bf269d24464c14c9cf4f7dea876aa562984db63. It incorporates updates from CDTFA as of 12/12/2024. Future updates will include improvements to metadata and update frequency.
d
California City Boundaries and Identifiers
catalog.data.gov
data.ca.gov
+1more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2025). California City Boundaries and Identifiers [Dataset]. https://catalog.data.gov/dataset/california-city-boundaries-and-identifiers
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Technology
Area covered
California City
Description
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of March 2025. The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCity boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal Buffers (this dataset)Counties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing excludes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCDTFA_CITY: CDTFA incorporated city nameCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_CITY_ABBR: Abbreviations of incorporated area names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 4 characters. Not present in the county-specific layers.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections. Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor.CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information.CDTFA's source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will
Auto Insurance Churn Data
kaggle.com
Updated Mar 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J Shad Griffin (2023). Auto Insurance Churn Data [Dataset]. https://www.kaggle.com/datasets/jshadgriffin/auto-insurance-churn-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
J Shad Griffin
Description
The data asset is relational. There are four different data files. One represents customer information. A second contains address information. A third contains demographic data, and a fourth includes customer cancellation information. All of the data sets have linking ids, either ADDRESS_ID or CUSTOMER_ID. The ADDRESS_ID is specific to a postal service address. The CUSTOMER_ID is unique to a particular individual. Note that there can be multiple customers assigned to the same address. Also, note that not all customers have a match in the demographic table. The latitude-longitude information generally refers to the Dallas-Fort Worth Metroplex in North Texas and is mappable at a high level. Just be aware that if you drill down too far, some people may live in the middle of Jerry World, DFW Airport, or Lake Grapevine. Any lat/long pointing to a specific residence, business, or physical site is coincidental. The physical addresses are fake and are unrelated to the lat/long.

In the termination table, you can derive a binary (churn/did not churn) from the ACCT_SUSPD_DATE field. The data set is modelable. That is, you can use the other data in the data to predict who did and did not churn. The underlying logic behind the prediction should be consistent with predicting auto insurance churn in the real world.

Terms and Conditions Unless otherwise stated, the data on this site is free. It can be duplicated and used as you wish, but we'd appreciate and if you source it as coming from us.
d
India Email Receipt Panel Dataset (Direct from Data Originator) *No PII*
datarade.ai
.csv, .xls
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vumonic, India Email Receipt Panel Dataset (Direct from Data Originator) *No PII* [Dataset]. https://datarade.ai/data-products/india-email-receipt-panel-dataset-direct-from-data-originato-vumonic
Explore at:
.csv, .xlsAvailable download formats
Dataset authored and provided by
Vumonic
Area covered
India
Description
SUMMARY:

Vumonic provides its clients email receipt datasets on weekly, monthly, or quarterly subscriptions, for any online consumer vertical. We gain consent-based access to our users' email inboxes through our own proprietary apps, from which we gather and extract all the email receipts and put them into a structured format for consumption of our clients. We currently have over 1M users in our India panel.

If you are not familiar with email receipt data, it provides item and user-level transaction information (all PII-wiped), which allows for deep granular analysis of things like marketshare, growth, competitive intelligence, and more.

VERTICALS:

Ecommerce (Amazon, Flipkart, Myntra, Nykaa)

Taxi (Uber, Ola)

Food Delivery (Swiggy, Zomato)

OTT (Netflix, Amazon Prime Video, Disney+)

Appstore (Apple App Store and Google Playstore)

OTA (Expedia, Booking.com, GoIbibo)

E-wallets (PhonePe, PayTM)

Education (Byju's, Unacademy)

PRICING/QUOTE:

Our email receipt data is priced market-rate based on the requirement. To give a quote, all we need to know is:

what vertical you are interested in

how often do you wish to receive the data, and

do you want any backdata (e.g. from 2019 onwards)

Send us over this info and we can answer any questions you have, provide sample, and more.
Labour Market Statistics Statistical Bulletin time series dataset
data.wu.ac.at
data.europa.eu
Updated Feb 10, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2016). Labour Market Statistics Statistical Bulletin time series dataset [Dataset]. https://data.wu.ac.at/odso/data_gov_uk/MGZhNDQyYTUtMjYyYi00MWE3LTk2ZDUtMWQ4ZTljOWY4Njk5
Explore at:
Dataset updated
Feb 10, 2016
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
Description
This is a large dataset which contains the labour market statistics data series published in the monthly Labour Market Statistics Statistical Bulletin. The dataset is overwritten every month and it therefore always contains the latest published data. The Time Series dataset facility is primarily designed for users who wish to customise their own datasets. For example, users can create a single spreadsheet including series for unemployment, claimant count, employment and workforce jobs, rather than extracting the required data from several separate spreadsheets published on the website.
Bookstore Inventory: Best Sellers and New Releases
kaggle.com
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahmida (2024). Bookstore Inventory: Best Sellers and New Releases [Dataset]. https://www.kaggle.com/datasets/fahmidachowdhury/bookstore-inventory-best-sellers-and-new-releases/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2024
Dataset provided by
Kaggle
Authors
Fahmida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains detailed information about a wide range of books available for purchase on an online retailer's website. It includes data such as book titles, authors, categories, prices, stock status, number of copies left, book length in pages, edition details, publication information, and customer engagement metrics like wished users counts and discount offers. This dataset is ideal for data analysis projects focusing on book sales trends, customer preferences, and market insights within the online retail book industry. Whether you're exploring pricing strategies, customer behavior, or genre popularity, this dataset provides a rich resource for data-driven exploration and analysis in the domain of online book retailing. Content:

Book Title: Title of the book. Author: Author(s) of the book. Category: Category or genre of the book. Price (TK): Price of the book in TK (local currency). Stock Status: Availability status of the book (In Stock/Out of Stock). Copies Left: Number of copies currently available. Book Length (Pages): Number of pages in the book. Edition: Edition details of the book. Publication: Publisher or publication details. Wished Users: Number of users who have added this book to their wish list. Discount Offer: Any available discount or promotional offer on the book. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17551034%2F8d16bae9e2eb4046322d2423daaaad97%2F949302f2-fb09-4fbf-95d6-338388c4a753.png?generation=1719996201879469&alt=media" alt="">
Master X-Ray Catalog - Dataset - NASA Open Data Portal
data.nasa.gov
data.staging.idas-ds1.appdat.jsc.nasa.gov
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Master X-Ray Catalog - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/master-x-ray-catalog
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The XRAY database table contains selected parameters from almost all HEASARC X-ray catalogs that have source positions located to better than a few arcminutes. The XRAY database table was created by copying all of the entries and common parameters from the tables listed in the Component Tables section. The XRAY database table has many entries but relatively few parameters; it provides users with general information about X-ray sources, obtained from a variety of catalogs. XRAY is especially suitable for cone searches and cross-correlations with other databases. Each entry in XRAY has a parameter called 'database_table' which indicates from which original database the entry was copied; users can browse that original table should they wish to examine all of the parameter fields for a particular entry. For some entries in XRAY, some of the parameter fields may be blank (or have zero values); this indicates that the original database table did not contain that particular parameter or that it had this same value there. The HEASARC in certain instances has included X-ray sources for which the quoted value for the specified band is an upper limit rather than a detection. The HEASARC recommends that the user should always check the original tables to get the complete information about the properties of the sources listed in the XRAY master source list. This master catalog is updated periodically whenever one of the component database tables is modified or a new component database table is added. This is a service provided by NASA HEASARC .
R
Canned Goods Dataset
universe.roboflow.com
zip
Updated Jan 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vizonix (2023). Canned Goods Dataset [Dataset]. https://universe.roboflow.com/vizonix/canned-goods
Explore at:
zipAvailable download formats
Dataset updated
Jan 22, 2023
Dataset authored and provided by
Vizonix
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Canned Goods Bounding Boxes
Description
The Canned Goods Dataset

by Vizonix

This dataset differentiates between 4 similar object classes: 4 types of canned goods. We built this dataset with cans of olives, beans, stewed tomatoes, and refried beans.

The dataset is pre-augmented. That is to say, all required augmentations are applied to the actual native dataset prior to inference. We have found that augmenting this way provides our users maximum visibility and flexibility in tuning their dataset (and classifier) to achieve their specific use-case goals. Augmentations are present and visible in the native dataset prior to the classifier - so it's never a mystery what augmentation tweaks produce a more positive or negative outcome during training. It also eliminates the risk of downsizing affecting annotations.

The training images in this dataset were created in our studio in Florida from actual physical objects to the following specifications:

Each item was imaged using a 360-degree horizontal rotation - imaged every 9 degrees at 0 degrees of elevation.

Each item was imaged (per above) 3 times - using physical left lighting, right lighting, and frontal lighting.

Backgrounds in this dataset are completely random - they do not factor into the classifier's decision-making (nor do we ever want them to). We used 100% random backgrounds generated in-house. This eliminates background bias in the dataset. Our use of random backgrounds is a newly released feature in our datasets.

The training images in this dataset were composited / augmented in this way:

Imaged objects were randomly rotated in frame from 15 to 340 degrees.

Imaged objects were randomly positioned in frame.

Imaged objects were randomly sized from .33 to 1.0 of original.

Image contrast was randomly adjusted from .7 to 1.25.

Gaussian blur was randomly introduced at a factor from 2 to 5.

Color channels were dropped randomly (R,G,B).

Grayscale images were introduced randomly.

Soft occlusions (noise, and others) in random transperencies were randomly introduced.

Hard occlusions (noise and others) in solid transperencies were randomly introduced.

Brightness was randomly adjusted.

Sharpness was randomly adjusted.

Color balance was randomly adjusted.

Images were resized to 640x640 for Roboflow's platform.

1,600 (+) different images were uploaded for each class (out of the 25,000 total images created for each class).

Understanding our Dataset Insights File

As users train their classifiers, they often wish to enhance accuracy by experimenting with or tweaking their dataset. With our Dataset Insights documents, they can easily determine which images possess which augmentations. Dataset Insights allow users to easily add or remove images with specific augmentations as they wish. This also provides a detailed profile and inventory of each file in the dataset.

The Dataset Insights document enables the user to see exactly which source image, angle, augmentation(s), etc. were used to create each image in the dataset.

Dataset Insight Files:

Beans: [https://176288177.s3.amazonaws.com/Dataset_Insights_Beans.xlsx]

Olives: [https://176288177.s3.amazonaws.com/Dataset_Insights_Olives.xlsx]

Refried Beans: [https://176288177.s3.amazonaws.com/Dataset_Insights_Refried_Beans.xlsx]

Tomatoes: [https://176288177.s3.amazonaws.com/Dataset_Insights_Tomatoes.xlsx]

About Vizonix

Vizonix (vizonix.com) creates from-scratch datasets created from 100% in-house generated photography. Our images and backgrounds are generated in-house in our Florida studio. We typically image smaller items, deliver in 72 hours, and specialize in Manufacturer Quality Assurance (MQA) datasets.
Bluesky Social Dataset
zenodo.org
application/gzip, csv
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti (2025). Bluesky Social Dataset [Dataset]. http://doi.org/10.5281/zenodo.14669616
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14669616
Dataset updated
Jan 16, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrea Failla; Andrea Failla; Giulio Rossetti; Giulio Rossetti
License
https://bsky.social/about/support/toshttps://bsky.social/about/support/tos
Description
Bluesky Social Dataset

Pollution of online social spaces caused by rampaging d/misinformation is a growing societal concern. However, recent decisions to reduce access to social media APIs are causing a shortage of publicly available, recent, social media data, thus hindering the advancement of computational social science as a whole. We present a large, high-coverage dataset of social interactions and user-generated content from Bluesky Social to address this pressing issue.

The dataset contains the complete post history of over 4M users (81% of all registered accounts), totaling 235M posts. We also make available social data covering follow, comment, repost, and quote interactions.

Since Bluesky allows users to create and bookmark feed generators (i.e., content recommendation algorithms), we also release the full output of several popular algorithms available on the platform, along with their “like” interactions and time of bookmarking.

Dataset

Here is a description of the dataset files.

followers.csv.gz. This compressed file contains the anonymized follower edge list. Once decompressed, each row consists of two comma-separated integers representing a directed following relation (i.e., user u follows user v).

user_posts.tar.gz. This compressed folder contains data on the individual posts collected. Decompressing this file results in a collection of files, each containing the post of an anonymized user. Each post is stored as a JSON-formatted line.

interactions.csv.gz. This compressed file contains the anonymized interactions edge list. Once decompressed, each row consists of six comma-separated integers representing a comment, repost, or quote interaction. These integers correspond to the following fields, in this order: user_id, replied_author, thread_root_author, reposted_author,quoted_author, and date.

graphs.tar.gz. This compressed folder contains edge list files for the graphs emerging from reposts, quotes, and replies. Each interaction is timestamped. The folder also contains timestamped higher-order interactions emerging from discussion threads, each containing all users participating in a thread.

feed_posts.tar.gz. This compressed folder contains posts that appear in 11 thematic feeds. Decompressing this folder results in 11 files containing posts from one feed each. Posts are stored as a JSON-formatted line. Fields are correspond to those in posts.tar.gz, except for those related to sentiment analysis (sent_label, sent_score), and reposts (repost_from, reposted_author);

feed_bookmarks.csv. This file contains users who bookmarked any of the collected feeds. Each record contains three comma-separated values: the feed name, user id, and timestamp.

feed_post_likes.tar.gz. This compressed folder contains data on likes to posts appearing in the feeds, one file per feed. Each record in the files contains the following information, in this order: the id of the ``liker'', the id of the post's author, the id of the liked post, and the like timestamp;

scripts.tar.gz. A collection of Python scripts, including the ones originally used to crawl the data, and to perform experiments. These scripts are detailed in a document released within the folder.

Citation

If used for research purposes, please cite the following paper describing the dataset details:

Andrea Failla and Giulio Rossetti. "I'm in the Bluesky Tonight: Insights from a Year's Worth of Social Data." PlosOne (2024) https://doi.org/10.1371/journal.pone.0310330

Right to Erasure (Right to be forgotten)

Note: If your account was created after March 21st, 2024, or if you did not post on Bluesky before such date, no data about your account exists in the dataset. Before sending a data removal request, please make sure that you were active and posting on bluesky before March 21st, 2024.

Users included in the Bluesky Social dataset have the right to opt-out and request the removal of their data, per GDPR provisions (Article 17).

We emphasize that the released data has been thoroughly pseudonymized in compliance with GDPR (Article 4(5)). Specifically, usernames and object identifiers (e.g., URIs) have been removed, and object timestamps have been coarsened to protect individual privacy further and minimize reidentification risk. Moreover, it should be noted that the dataset was created for scientific research purposes, thereby falling under the scenarios for which GDPR provides opt-out derogations (Article 17(3)(d) and Article 89).

Nonetheless, if you wish to have your activities excluded from this dataset, please submit your request to blueskydatase tmoderation@gmail.com (with the subject "Removal request: [username]"). We will process your request within a reasonable timeframe - updates will occur monthly, if necessary, and access to previous versions will be restricted.

Acknowledgments:

This work is supported by :

the European Union – Horizon 2020 Program under the scheme “INFRAIA-01-2018-2019 – Integrating Activities for Advanced Communities”,
Grant Agreement n.871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics” (http://www.sobigdata.eu);

SoBigData.it which receives funding from the European Union – NextGenerationEU – National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR) – Project: “SoBigData.it – Strengthening the Italian RI for Social Mining and Big Data Analytics” – Prot. IR0000013 – Avviso n. 3264 del 28/12/2021;

EU NextGenerationEU programme under the funding schemes PNRR-PE-AI FAIR (Future Artificial Intelligence Research).
c
California City Boundaries and Identifiers with Coastal Buffers
gis.data.ca.gov
data.ca.gov
+1more
Updated Oct 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2024). California City Boundaries and Identifiers with Coastal Buffers [Dataset]. https://gis.data.ca.gov/datasets/California::california-city-boundaries-and-identifiers-with-coastal-buffers
Explore at:
Dataset updated
Oct 24, 2024
Dataset authored and provided by
California Department of Technology
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered

Description
Note: The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services beginning in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is regularly updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications. Purpose City boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use. Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal Buffers (this dataset)Counties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon) Working with Coastal Buffers The dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers. Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.gov Field and Abbreviation DefinitionsCDTFA_CITY: CDTFA incorporated city nameCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_CITY_ABBR: Abbreviations of incorporated area names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 4 characters. Not present in the county-specific layers.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead. Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections.Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor. CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information. CDTFA's source data notes the following about accuracy: City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties. In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose. SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon. Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline.
d
Single Digital View Project Pipeline - Dataset - Datopian CKAN instance
demo.dev.datopian.com
Updated May 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Single Digital View Project Pipeline - Dataset - Datopian CKAN instance [Dataset]. https://demo.dev.datopian.com/dataset/sp-energy-networks--gsp-queue-position
Explore at:
Dataset updated
May 27, 2025
Description
SPEN’s Digital View of Distribution Connections provides greater transparency of major connections pipelines (1MW and above) at SPM and SPD Grid Supply Points (GSP) for existing and new customers looking to obtain a generation connection. Please see disclaimers for appropriate use and liability associated with this.Disclaimer: Digital View of Distribution Connections has been developed to provide customers with a snapshot of some of the key known connection constraints and the connections pipeline at each SPEN GSP. However, please note this is not ordered by queue position. No express or implied condition, warranty, term or representation is given by SPEN regarding the quality, accuracy or completeness of the information contained within or produced as part of the reports or any related information. SPEN shall have no liability to any user for any loss or damage of any kind incurred as a result of the use of Digital View of Distribution Connections, or reliance by the user on any information provided within it. If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above. Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the Single Digital View dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information page. Download dataset metadata (JSON)
e
Patents, Designs and Trade Marks, 2010: Secure Access - Dataset - B2FIND
b2find.eudat.eu
Updated Oct 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Patents, Designs and Trade Marks, 2010: Secure Access - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/88ce7976-d861-51c4-a6a9-8dcbdea26629
Explore at:
Dataset updated
Oct 23, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. The Patents, Designs and Trade Marks, 2010: Secure Access dataset includes details on applications to the Intellectual Property Office (IPO) for patents, designs and trade marks by businesses or individuals. The patent file holds the main information for all patents in Great Britain attained through the Department of Business, Innovation and Skills' Optics extract in June/July 2010. The file includes patent applications filed between 1978 and 2009. There should be no multiple observations due to the uniqueness and single occurrence of an application number within these datasets. A patent can have more than one IPC classification, depending on how many purposes it fulfils. The trade mark analysable dataset was created using the trade mark licensee data, which relates to trademarks applied for with the IPO. The data were extracted in July 2010 and include applications filed between 1876 and 2010. The licensee data is that provided to the IPO's external customers and holds the same information as contained within the IPO website. This dataset represents one of four relating to trade marks; this covers all trade marks applied for within the UK - there are three other datasets relating to trade marks applied for via the Madrid UK/Madrid EP agreements and Office of Harmonization for the Internal Market. The UK design data represent all designs applied for with the IPO between 1974 and 2010. The data were extracted in May 2010. The patent, design and trade mark data provided are readily available from online sources. The data are provided for Secure Access so that users can link the data to Secure Access business surveys using Inter-Departmental Business Register (IDBR) reference numbers, which are anonymous but unique reference numbers assigned to business organisations. Other Secure Access business surveys with which users may wish to combine the IPO data include the Annual Respondents Database (SN 6644), the UK Innovation Survey (SN 6699), and the Business Expenditure on Research and Development survey (SN 6690). In preparing the patent, design and trade mark data for release, certain variables that can lead to the identification of businesses or individuals on the IPO website have been anonymised. These variables include applicant numbers, application numbers and design numbers. The patent data include postcodes for a proportion of the applicants. The trade mark data include the country of the proprietor. The design data include no spatial units.For Secure Lab projects applying for access to this study as well as to SN 6697 Business Structure Database and/or SN 7683 Business Structure Database Longitudinal, only postcode-free versions of the data will be made available.
Sentinel-2 Cloud Mask Catalogue
zenodo.org
csv, pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod (2024). Sentinel-2 Cloud Mask Catalogue [Dataset]. http://doi.org/10.5281/zenodo.4172871
Explore at:
pdf, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4172871
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset comprises cloud masks for 513 1022-by-1022 pixel subscenes, at 20m resolution, sampled random from the 2018 Level-1C Sentinel-2 archive. The design of this dataset follows from some observations about cloud masking: (i) performance over an entire product is highly correlated, thus subscenes provide more value per-pixel than full scenes, (ii) current cloud masking datasets often focus on specific regions, or hand-select the products used, which introduces a bias into the dataset that is not representative of the real-world data, (iii) cloud mask performance appears to be highly correlated to surface type and cloud structure, so testing should include analysis of failure modes in relation to these variables.

The data was annotated semi-automatically, using the IRIS toolkit, which allows users to dynamically train a Random Forest (implemented using LightGBM), speeding up annotations by iteratively improving it's predictions, but preserving the annotator's ability to make final manual changes when needed. This hybrid approach allowed us to process many more masks than would have been possible manually, which we felt was vital in creating a large enough dataset to approximate the statistics of the whole Sentinel-2 archive.

In addition to the pixel-wise, 3 class (CLEAR, CLOUD, CLOUD_SHADOW) segmentation masks, we also provide users with binary
classification "tags" for each subscene that can be used in testing to determine performance in specific circumstances. These include:

SURFACE TYPE: 11 categories

CLOUD TYPE: 7 categories

CLOUD HEIGHT: low, high

CLOUD THICKNESS: thin, thick

CLOUD EXTENT: isolated, extended

Wherever practical, cloud shadows were also annotated, however this was sometimes not possible due to high-relief terrain, or large ambiguities. In total, 424 were marked with shadows (if present), and 89 have shadows that were not annotatable due to very ambiguous shadow boundaries, or terrain that cast significant shadows. If users wish to train an algorithm specifically for cloud shadow masks, we advise them to remove those 89 images for which shadow was not possible, however, bear in mind that this will systematically reduce the difficulty of the shadow class compared to real-world use, as these contain the most difficult shadow examples.

In addition to the 20m sampled subscenes and masks, we also provide users with shapefiles that define the boundary of the mask on the original Sentinel-2 scene. If users wish to retrieve the L1C bands at their original resolutions, they can use these to do so.

Please see the README for further details on the dataset structure and more.

Contributions & Acknowledgements

The data were collected, annotated, checked, formatted and published by Alistair Francis and John Mrziglod.

Support and advice was provided by Prof. Jan-Peter Muller and Dr. Panagiotis Sidiropoulos, for which we are grateful.

We would like to extend our thanks to Dr. Pierre-Philippe Mathieu and the rest of the team at ESA PhiLab, who provided the environment in which this project was conceived, and continued to give technical support throughout.

Finally, we thank the ESA Network of Resources for sponsoring this project by providing ICT resources.
The Invasion of Ukraine Viewed through TikTok: A Dataset
zenodo.org
bin, csv +1
Updated May 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths (2023). The Invasion of Ukraine Viewed through TikTok: A Dataset [Dataset]. http://doi.org/10.5281/zenodo.7926959
Explore at:
text/x-python, bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7926959
Dataset updated
May 13, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Benjamin Steel; Sara Parker; Derek Ruths; Benjamin Steel; Sara Parker; Derek Ruths
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ukraine
Description
This is a dataset of videos and comments related to the invasion of Ukraine, published on TikTok by a number of users over the year of 2022. It was compiled by Benjamin Steel, Sara Parker and Derek Ruths at the Network Dynamics Lab, McGill University. We created this dataset to facilitate the study of TikTok, and the nature of social interaction on the platform relevant to a major political event.

The dataset has been released here on Zenodo: https://doi.org/10.5281/zenodo.7926959 as well as on Github: https://github.com/networkdynamics/data-and-code/tree/master/ukraine_tiktok

To create the dataset, we identified hashtags and keywords explicitly related to the conflict to collect a core set of videos (or ”TikToks”). We then compiled comments associated with these videos. All of the data captured is publically available information, and contains personally identifiable information. In total we collected approximately 16 thousand videos and 12 million comments, from approximately 6 million users. There are approximately 1.9 comments on average per user captured, and 1.5 videos per user who posted a video. The author personally collected this data using the web scraping PyTok library, developed by the author: https://github.com/networkdynamics/pytok.

Due to scraping duration, this is just a sample of the publically available discourse concerning the invasion of Ukraine on TikTok. Due to the fuzzy search functionality of the TikTok, the dataset contains videos with a range of relatedness to the invasion.

We release here the unique video IDs of the dataset in a CSV format. The data was collected without the specific consent of the content creators, so we have released only the data required to re-create it, to allow users to delete content from TikTok and be removed from the dataset if they wish. Contained in this repository are scripts that will automatically pull the full dataset, which will take the form of JSON files organised into a folder for each video. The JSON files are the entirety of the data returned by the TikTok API. We include a script to parse the JSON files into CSV files with the most commonly used data. We plan to further expand this dataset as collection processes progress and the war continues. We will version the dataset to ensure reproducibility.

To build this dataset from the IDs here:

Go to https://github.com/networkdynamics/pytok and clone the repo locally

Run pip install -e . in the pytok directory

Run pip install pandas tqdm to install these libraries if not already installed

Run get_videos.py to get the video data

Run video_comments.py to get the comment data

Run user_tiktoks.py to get the video history of the users

Run hashtag_tiktoks.py or search_tiktoks.py to get more videos from other hashtags and search terms

Run load_json_to_csv.py to compile the JSON files into two CSV files, comments.csv and videos.csv

If you get an error about the wrong chrome version, use the command line argument get_videos.py --chrome-version YOUR_CHROME_VERSION Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting.

Please do not hesitate to make an issue in this repo to get our help with this!

The videos.csv will contain the following columns:

video_id: Unique video ID

createtime: UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

desc: The full video description from the author

hashtags: A list of hashtags used in the video description

share_video_id: If the video is sharing another video, this is the video ID of that original video, else empty

share_video_user_id: If the video is sharing another video, this the user ID of the author of that video, else empty

share_video_user_name: If the video is sharing another video, this is the user name of the author of that video, else empty

share_type: If the video is sharing another video, this is the type of the share, stitch, duet etc.

mentions: A list of users mentioned in the video description, if any

The comments.csv will contain the following columns:

comment_id: Unique comment ID

createtime: UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format

author_name: Unique author name

author_id: Unique author ID

text: Text of the comment

mentions: A list of users that are tagged in the comment

video_id: The ID of the video the comment is on

comment_language: The language of the comment, as predicted by the TikTok API

reply_comment_id: If the comment is replying to another comment, this is the ID of that comment

The date can be compiled into a user interaction network to facilitate study of interaction dynamics. There is code to help with that here: https://github.com/networkdynamics/polar-seeds. Additional scripts for further preprocessing of this data can be found there too.
e
My account formation – Users
data.europa.eu
csv, json, zip
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caisse des Dépôts (2025). My account formation – Users [Dataset]. https://data.europa.eu/data/datasets/https-opendata-caissedesdepots-fr-explore-dataset-moncompteformation-les-usagers-/embed
Explore at:
csv, zip, jsonAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Caisse des Dépôts
License
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Description
Since November 21, 2019, My Training Account (‘https://www.moncompteformation.gouv.fr/’) allows both private sector and unemployed employees to choose their training as part of a direct purchase journey financed with their Personal Training Account (CPF). In order to better define their professional project, if they wish, they can be accompanied by a professional development advisor (CEP), their region, a skills operator, or Pôleemploi.

The training courses offered in My Training Account are certification courses submitted by training organisations via the EDOF portal. Certification training means any training relating to a certification registered in one of the two national registers administered by France Compétences, namely the Specific Register (RS) or the National Register of Professional Qualifications (RNCP). In addition to these trainings, four legislative exceptions are the driving licence, the competency check, the VAE and the training activities given to business creators and buyers. Since January 2022, training organisations must also have the Qualiopi certification in order to be able to exercise via the platform.

The dataset makes it possible to reconstruct the main characteristics of the users of My Training Account. It covers all user training entries since the beginning of 2020, including training courses that have been completed since: the status of the file makes it possible to share between closed files and ongoing training. The files are grouped by training date quarter, and the dataset update is quarterly, at the beginning of the quarter. The characteristics available are the region of residence, gender, age class and status of the holder. The latter makes it possible to share between jobseekers and other users: for the others, it specifies the CSP declared in the training file when it is filled in.

The available indicators make it possible to count the number of files, the number of trainees (or holder, a holder who may have several training files in the same year) as well as the amounts committed in euros at the time of purchase, with their distribution by large family of funders (France skills, remains dependent on the holder, Pôle Emploi, regions, OPCO...). When a combination of characteristics describing the user involves less than 5 persons, the number of files and amounts are empty. The average price and duration are provided for each line, i.e. for each combination of characteristics describing the user, for a given training quarter and for records which at the date of extraction of the dataset have the same status. As the duration is not systematically provided by the training organisations, the average is calculated on the number of files for which the duration is filled out.

** Users dashboard accessible HERE**

(https://opendata.caissedesdepots.fr/assets/theme_image/1_moncompteformation-logo-RVB.png)!(/assets/theme_image/MIN_Work_Plein_Employment_Insertion_RVB_reduit2.png)
Hotel Listings 2019
kaggle.com
Updated Mar 27, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PromptCloud (2020). Hotel Listings 2019 [Dataset]. https://www.kaggle.com/datasets/promptcloud/hotel-listings-2019/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 27, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
PromptCloud
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

PromptCloud and DataStock extracted this data from Booking.com to find out the rates and prices and states hotels were available for the period of 1 year from December 2018 to December 2019. This is a sample dataset of 30K records.

You can download the full dataset here

Content

This dataset was procured to give knowledge about the various hotels that are present on Booking.com. This dataset will be helpful for the researchers and students who want these type of specific datasets that can be used for various case studies and projects based on different hotels across the globe that is available on booking.com

The Data Fields That This File Contain Are: Root Folders 456 Root Folders Each Root Folder Contains - Uniq_ID - Hotel_ID - Hotel_Name - Review_Count - Default_Rank - Price_Rank - OTA

Acknowledgements

This dataset was created by PromptCloud's In-House Data Crawling Team

Inspiration

We want users to use clean and raw data which will help them gain access to knowledge about different sites and help them in their various projects or research that they might conduct. We want our customers to feel that they can depend on datasets like this from us and that is what drives us. Customer satisfaction is our main priority and we only wish the best for them and they keep us going.### Context

PromptCloud and DataStock extracted this data from Booking.com to find out the rates and prices and states hotels were available for the time period of 1 year from December 2018 to December 2019. This is a sample dataset of 30K records.

You can download the full dataset here

Content

This dataset was procured to give knowledge about the various hotels that are present on Booking.com. This dataset will be helpful for the researchers and students who want these type of specific datasets that can be used for various case studies and projects based on different hotels across the globe that is available on booking.com

The Data Fields That This File Contain Are: Root Folders 456 Root Folders Each Root Folder Contains - Uniq_ID - Hotel_ID - Hotel_Name - Review_Count - Default_Rank - Price_Rank - OTA

Acknowledgements

This dataset was created by PromptCloud's In-House Data Crawling Team

Inspiration

We want users to use clean and raw data which will help them gain access to knowledge about different sites and help them in their various projects or research that they might conduct. We want our customers to feel that they can depend on datasets like this from us and that is what drives us. Customer satisfaction is our main priority and we only wish the best for them and they keep us going.
d
SPD DG Connections Network Info - Dataset - Datopian CKAN instance
demo.dev.datopian.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). SPD DG Connections Network Info - Dataset - Datopian CKAN instance [Dataset]. https://demo.dev.datopian.com/dataset/sp-energy-networks--spd-dg-connections-network-info
Explore at:
Dataset updated
May 27, 2025
Description
The "SPD DG Connections Network Info" dataset provides network capacity information, network reinforcement requirements and estimated connection dates, based on generation and storage resources that are connected, or accepted to connect to the SP Energy Network's (SPEN) SP Distribution (SPD) network and will be updated on a monthly basis (at least). For additional information on column definitions, please click on the Dataset schema link below.Disclaimer: This register has been developed to provide Distributed Generation (DG) customers a useable guide to connection. Whilst we use reasonable endeavours to ensure that the data contained within the register is accurate, we do not accept any responsibility or liability for, the accuracy or the completeness of the content or any loss which may arise from reliance on the register and related information.Note: A formatted copy of this dataset can be downloaded from the Export tab under Alternative exports.If you wish to provide feedback at a dataset or row level, please click on the “Feedback” tab above.Data TriageAs part of our commitment to enhancing the transparency, and accessibility of the data we share, we publish the results of our Data Triage process.Our Data Triage documentation includes our Risk Assessments; detailing any controls we have implemented to prevent exposure of sensitive information. Click here to access the Data Triage documentation for the SPD DG Connections Network Info dataset. To access our full suite of Data Triage documentation, visit the SP Energy Networks Data & Information.Download dataset metadata (JSON)
GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07
rda.ucar.edu
data.ucar.edu
+1more
Updated May 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
G. Huffman; E. Stocker; D. Bolvin; E. Nelkin; Jackson Tan (2024). GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07 [Dataset]. http://doi.org/10.5065/7DE2-M746
Explore at:
Unique identifier
https://doi.org/10.5065/7DE2-M746
Dataset updated
May 2, 2024
Dataset provided by
University Corporation for Atmospheric Research
Authors
G. Huffman; E. Stocker; D. Bolvin; E. Nelkin; Jackson Tan
Time period covered
Jun 1, 2000 - Mar 31, 2025
Area covered
Earth
Description
This dataset contains Version 07 of the Integrated Multi-satellitE Retrievals for GPM (IMERG) IMERG Level 3 "Final Run" precipitation analysis at 0.1 degree, daily resolution.

From the official GPM IMERG site at NASA GES DISC [https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_07/summary]: The Integrated Multi-satellitE Retrievals for GPM (IMERG) IMERG is a NASA product estimating global surface precipitation rates at a high resolution of 0.1 degree every half-hour beginning June 2000. It is part of the joint NASA-JAXA Global Precipitation Measurement (GPM) mission, using the GPM Core Observatory satellite (for June 2014 to present) and the Tropical Rainfall Measuring Mission (TRMM) satellite (for June 2000 to May 2014) as the standard to combine precipitation observations from an international constellation of satellites using advanced techniques. IMERG can be used for global-scale applications, including over regions with sparse or no reliable surface observations. The fine spatial and temporal resolution of IMERG data allows them to be accumulated to the scale of a user's application for increased skill. IMERG has three Runs with varying latencies in response to a range of application needs: rapid-response applications (Early Run, 4-hour latency), same/next-day applications (Late Run, 14-hour latency), and post-real-time research (Final Run, 4-month latency). While IMERG strives for consistency and accuracy, satellite estimates of precipitation are expected to have lower skill over frozen surfaces, complex terrain, and coastal zones. As well, the changing GPM satellite constellation over time may introduce artifacts that affect studies focusing on multi-year changes. This dataset is the GPM Level 3 IMERG Final Daily 0.1 degree x 0.1 degree (GPM_3IMERGDF) computed from the half-hourly GPM_3IMERGHH. The dataset represents the Final Run estimate of the daily mean precipitation rate in mm/day. The dataset is produced by first computing the mean precipitation rate in (mm/hour) in every non-missing grid cell, and then multiplying the result by 24. This minimizes the possible dry bias in versions before V07, in which the simple daily totals were computed even if the cell had less than 48 non-missing half-hourly observations for the day. This under-sampling is very rare in V07 except directly at the poles. Thus, in most cases users of global "precipitation" data field would not notice any difference. This change, however, is noticeable in the microwave-only data field, variable "MWprecipitation", where less than 48 valid half-hourly samples per day is very common. The counts of the valid half-hourly samples per day have always been provided as a separate variable, and users of daily data were advised to pay close attention to that variable and use it to calculate the correct precipitation daily rates. Starting with V07, this is done in production to minimize possible misinterpretations of the data. The counts are still provided in the data, but they are only given so that users may gauge the significance of the daily rates, and reconstruct the simple totals if someone wishes to do so.

See the official GPM IMERG site at NASA GES DISC [https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGDF_07/summary] for the complete dataset abstract and more information.
d
California County Boundaries and Identifiers with Coastal Buffers
catalog.data.gov
data.ca.gov
+2more
Updated Jul 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Technology (2025). California County Boundaries and Identifiers with Coastal Buffers [Dataset]. https://catalog.data.gov/dataset/california-county-boundaries-and-identifiers-with-coastal-buffers
Explore at:
Dataset updated
Jul 24, 2025
Dataset provided by
California Department of Technology
Area covered
California
Description
WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of March 2025. The schema changed in February 2025 - please see below. We will post a roadmap of upcoming changes, but service URLs and schema are now stable. For deployment status of new services in February 2025, see https://gis.data.ca.gov/pages/city-and-county-boundary-data-status. Additional roadmap and status links at the bottom of this metadata.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty boundaries along with third party identifiers used to join in external data. Boundaries are from the California Department of Tax and Fee Administration (CDTFA). These boundaries are the best available statewide data source in that CDTFA receives changes in incorporation and boundary lines from the Board of Equalization, who receives them from local jurisdictions for tax purposes. Boundary accuracy is not guaranteed, and though CDTFA works to align boundaries based on historical records and local changes, errors will exist. If you require a legal assessment of boundary location, contact a licensed surveyor.This dataset joins in multiple attributes and identifiers from the US Census Bureau and Board on Geographic Names to facilitate adding additional third party data sources. In addition, we attach attributes of our own to ease and reduce common processing needs and questions. Finally, coastal buffers are separated into separate polygons, leaving the land-based portions of jurisdictions and coastal buffers in adjacent polygons. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal Buffers (this dataset)Without Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal BuffersWithout Coastal BuffersCity and County AbbreviationsUnincorporated Areas (Coming Soon)Census Designated PlacesCartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except OFFSHORE and AREA_SQMI to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCDTFA_COUNTY: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.CDTFA_COPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering system. The boundary data originate with CDTFA's teams managing tax rate information, so this field is preserved and flows into this dataset.CENSUS_GEOID: numeric geographic identifiers from the US Census BureauCENSUS_PLACE_TYPE: City, County, or Town, stripped off the census name for identification purpose.GNIS_PLACE_NAME: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.CDT_COUNTY_ABBR: Abbreviations of county names - originally derived from CalTrans Division of Local Assistance and now managed by CDT. Abbreviations are 3 characters.CDT_NAME_SHORT: The name of the jurisdiction (city or county) with the word "City" or "County" stripped off the end. Some changes may come to how we process this value to make it more consistent.AREA_SQMI: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.OFFSHORE: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".PRIMARY_DOMAIN: Currently empty/null for all records. Placeholder field for official URL of the city or countyCENSUS_POPULATION: Currently null for all records. In the future, it will include the most recent US Census population estimate for the jurisdiction.GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.Boundary AccuracyCounty boundaries were originally derived from a 1:24,000 accuracy dataset, with improvements made in some places to boundary alignments based on research into historical records and boundary changes as CDTFA learns of them. City boundary data are derived from pre-GIS tax maps, digitized at BOE and CDTFA, with adjustments made directly in GIS for new annexations, detachments, and corrections. Boundary accuracy within the dataset varies. While CDTFA strives to correctly include or exclude parcels from jurisdictions for accurate tax assessment, this dataset does not guarantee that a parcel is placed in the correct jurisdiction. When a parcel is in the correct jurisdiction, this dataset cannot guarantee accurate placement of boundary lines within or between parcels or rights of way. This dataset also provides no information on parcel boundaries. For exact jurisdictional or parcel boundary locations, please consult the county assessor's office and a licensed surveyor.CDTFA's data is used as the best available source because BOE and CDTFA receive information about changes in jurisdictions which otherwise need to be collected independently by an agency or company to compile into usable map boundaries. CDTFA maintains the best available statewide boundary information.CDTFA's source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into
e
Discourse of the School Dinners Debate, 2004-2008 - Dataset - B2FIND
b2find.eudat.eu
Updated May 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Discourse of the School Dinners Debate, 2004-2008 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/f045a342-303d-57c9-a09b-d926e8784c2d
Explore at:
Dataset updated
May 1, 2023
Description
Abstract copyright UK Data Service and data collection copyright owner. This is a qualitative data collection, including audio recordings of the individual and focus group interviews. The Discourse of the School Dinners Debate project studied the role of language and communication strategies in the ongoing intense national discussion of school meals. It looked at views of parents, pupils and key players through semi-structured interviews and focus groups. The aim was to understand issues relating to school meals, contribute to policy, and facilitate communication between stakeholders. Further information can be found on the Discourse of the School Dinners Debate web site or ESRC award web page. Downloading Audio Files Users should note that the audio recordings of the interviews for this study are in MP3 format, and have been divided between four zip files to ease download. Registered users may log in and download all four MP3 zip files from the download page accordingly to obtain the full set of files, including transcripts (note that the format box will state 'Other' next to the link). For those users who do not wish to use audio files, a smaller zip file is available that contains the Rich Text Format (RTF) interview transcripts only (format box will state 'RTF'). Main Topics: Communication and understanding of messages surrounding the UK school dinners debate, school dinners, food politics, language of public debate, media language. Volunteer sample Convenience sample Face-to-face interview Telephone interview Transcription of existing materials Compilation or synthesis of existing material

Facebook

Twitter

Click to copy link

Link copied

Cite

California Department of Technology (2025). California Overlapping Cities and Counties and Identifiers with Coastal Buffers [Dataset]. https://catalog.data.gov/dataset/california-overlapping-cities-and-counties-and-identifiers-with-coastal-buffers

California Overlapping Cities and Counties and Identifiers with Coastal Buffers

Explore at:

Dataset updated

Jul 24, 2025

Dataset provided by

California Department of Technology

Description

WARNING: This is a pre-release dataset and its fields names and data structures are subject to change. It should be considered pre-release until the end of 2024. Expected changes:Metadata is missing or incomplete for some layers at this time and will be continuously improved.We expect to update this layer roughly in line with CDTFA at some point, but will increase the update cadence over time as we are able to automate the final pieces of the process.This dataset is continuously updated as the source data from CDTFA is updated, as often as many times a month. If you require unchanging point-in-time data, export a copy for your own use rather than using the service directly in your applications.PurposeCounty and incorporated place (city) boundaries along with third party identifiers used to join in external data. Boundaries are from the authoritative source the California Department of Tax and Fee Administration (CDTFA), altered to show the counties as one polygon. This layer displays the city polygons on top of the County polygons so the area isn"t interrupted. The GEOID attribute information is added from the US Census. GEOID is based on merged State and County FIPS codes for the Counties. Abbreviations for Counties and Cities were added from Caltrans Division of Local Assistance (DLA) data. Place Type was populated with information extracted from the Census. Names and IDs from the US Board on Geographic Names (BGN), the authoritative source of place names as published in the Geographic Name Information System (GNIS), are attached as well. Finally, the coastline is used to separate coastal buffers from the land-based portions of jurisdictions. This feature layer is for public use.Related LayersThis dataset is part of a grouping of many datasets:Cities: Only the city boundaries and attributes, without any unincorporated areasWith Coastal BuffersWithout Coastal BuffersCounties: Full county boundaries and attributes, including all cities within as a single polygonWith Coastal BuffersWithout Coastal BuffersCities and Full Counties: A merge of the other two layers, so polygons overlap within city boundaries. Some customers require this behavior, so we provide it as a separate service.With Coastal Buffers (this dataset)Without Coastal BuffersPlace AbbreviationsUnincorporated Areas (Coming Soon)Census Designated Places (Coming Soon)Cartographic CoastlinePolygonLine source (Coming Soon)Working with Coastal BuffersThe dataset you are currently viewing includes the coastal buffers for cities and counties that have them in the authoritative source data from CDTFA. In the versions where they are included, they remain as a second polygon on cities or counties that have them, with all the same identifiers, and a value in the COASTAL field indicating if it"s an ocean or a bay buffer. If you wish to have a single polygon per jurisdiction that includes the coastal buffers, you can run a Dissolve on the version that has the coastal buffers on all the fields except COASTAL, Area_SqMi, Shape_Area, and Shape_Length to get a version with the correct identifiers.Point of ContactCalifornia Department of Technology, Office of Digital Services, odsdataservices@state.ca.govField and Abbreviation DefinitionsCOPRI: county number followed by the 3-digit city primary number used in the Board of Equalization"s 6-digit tax rate area numbering systemPlace Name: CDTFA incorporated (city) or county nameCounty: CDTFA county name. For counties, this will be the name of the polygon itself. For cities, it is the name of the county the city polygon is within.Legal Place Name: Board on Geographic Names authorized nomenclature for area names published in the Geographic Name Information SystemGNIS_ID: The numeric identifier from the Board on Geographic Names that can be used to join these boundaries to other datasets utilizing this identifier.GEOID: numeric geographic identifiers from the US Census Bureau Place Type: Board on Geographic Names authorized nomenclature for boundary type published in the Geographic Name Information SystemPlace Abbr: CalTrans Division of Local Assistance abbreviations of incorporated area namesCNTY Abbr: CalTrans Division of Local Assistance abbreviations of county namesArea_SqMi: The area of the administrative unit (city or county) in square miles, calculated in EPSG 3310 California Teale Albers.COASTAL: Indicates if the polygon is a coastal buffer. Null for land polygons. Additional values include "ocean" and "bay".GlobalID: While all of the layers we provide in this dataset include a GlobalID field with unique values, we do not recommend you make any use of it. The GlobalID field exists to support offline sync, but is not persistent, so data keyed to it will be orphaned at our next update. Use one of the other persistent identifiers, such as GNIS_ID or GEOID instead.AccuracyCDTFA"s source data notes the following about accuracy:City boundary changes and county boundary line adjustments filed with the Board of Equalization per Government Code 54900. This GIS layer contains the boundaries of the unincorporated county and incorporated cities within the state of California. The initial dataset was created in March of 2015 and was based on the State Board of Equalization tax rate area boundaries. As of April 1, 2024, the maintenance of this dataset is provided by the California Department of Tax and Fee Administration for the purpose of determining sales and use tax rates. The boundaries are continuously being revised to align with aerial imagery when areas of conflict are discovered between the original boundary provided by the California State Board of Equalization and the boundary made publicly available by local, state, and federal government. Some differences may occur between actual recorded boundaries and the boundaries used for sales and use tax purposes. The boundaries in this map are representations of taxing jurisdictions for the purpose of determining sales and use tax rates and should not be used to determine precise city or county boundary line locations. COUNTY = county name; CITY = city name or unincorporated territory; COPRI = county number followed by the 3-digit city primary number used in the California State Board of Equalization"s 6-digit tax rate area numbering system (for the purpose of this map, unincorporated areas are assigned 000 to indicate that the area is not within a city).Boundary ProcessingThese data make a structural change from the source data. While the full boundaries provided by CDTFA include coastal buffers of varying sizes, many users need boundaries to end at the shoreline of the ocean or a bay. As a result, after examining existing city and county boundary layers, these datasets provide a coastline cut generally along the ocean facing coastline. For county boundaries in northern California, the cut runs near the Golden Gate Bridge, while for cities, we cut along the bay shoreline and into the edge of the Delta at the boundaries of Solano, Contra Costa, and Sacramento counties.In the services linked above, the versions that include the coastal buffers contain them as a second (or third) polygon for the city or county, with the value in the COASTAL field set to whether it"s a bay or ocean polygon. These can be processed back into a single polygon by dissolving on all the fields you wish to keep, since the attributes, other than the COASTAL field and geometry attributes (like areas) remain the same between the polygons for this purpose.SliversIn cases where a city or county"s boundary ends near a coastline, our coastline data may cross back and forth many times while roughly paralleling the jurisdiction"s boundary, resulting in many polygon slivers. We post-process the data to remove these slivers using a city/county boundary priority algorithm. That is, when the data run parallel to each other, we discard the coastline cut and keep the CDTFA-provided boundary, even if it extends into the ocean a small amount. This processing supports consistent boundaries for Fort Bragg, Point Arena, San Francisco, Pacifica, Half Moon Bay, and Capitola, in addition to others. More information on this algorithm will be provided soon.Coastline CaveatsSome cities have buffers extending into water bodies that we do not cut at the shoreline. These include South Lake Tahoe and Folsom, which extend into neighboring lakes, and San Diego and surrounding cities that extend into San Diego Bay, which our shoreline encloses. If you have feedback on the exclusion of these items, or others, from the shoreline cuts, please reach out using the contact information above.Offline UseThis service is fully enabled for sync and export using Esri Field Maps or other similar tools. Importantly, the GlobalID field exists only to support that use case and should not be used for any other purpose (see note in field descriptions).Updates and Date of ProcessingConcurrent with CDTFA updates, approximately every two weeks, Last Processed: 12/17/2024 by Nick Santos using code path at https://github.com/CDT-ODS-DevSecOps/cdt-ods-gis-city-county/ at commit 0bf269d24464c14c9cf4f7dea876aa562984db63. It incorporates updates from CDTFA as of 12/12/2024. Future updates will include improvements to metadata and update frequency.

Clear search

Close search

Google apps

Main menu

California Overlapping Cities and Counties and Identifiers with Coastal...

California City Boundaries and Identifiers

Auto Insurance Churn Data

India Email Receipt Panel Dataset (Direct from Data Originator) *No PII*

Labour Market Statistics Statistical Bulletin time series dataset

Bookstore Inventory: Best Sellers and New Releases

Master X-Ray Catalog - Dataset - NASA Open Data Portal

Canned Goods Dataset

The Canned Goods Dataset

Bluesky Social Dataset

Bluesky Social Dataset

Dataset

Citation

Right to Erasure (Right to be forgotten)

Acknowledgments:

California City Boundaries and Identifiers with Coastal Buffers

Single Digital View Project Pipeline - Dataset - Datopian CKAN instance

Patents, Designs and Trade Marks, 2010: Secure Access - Dataset - B2FIND

Sentinel-2 Cloud Mask Catalogue

The Invasion of Ukraine Viewed through TikTok: A Dataset

My account formation – Users

Hotel Listings 2019

Context

Content

Acknowledgements

Inspiration

Content

Acknowledgements

Inspiration

SPD DG Connections Network Info - Dataset - Datopian CKAN instance

GPM IMERG Final Precipitation L3 1 day 0.1 degree x 0.1 degree V07

California County Boundaries and Identifiers with Coastal Buffers

Discourse of the School Dinners Debate, 2004-2008 - Dataset - B2FIND

California Overlapping Cities and Counties and Identifiers with Coastal BuffersSee More Versions

India Email Receipt Panel Dataset (Direct from Data Originator) No PII

California Overlapping Cities and Counties and Identifiers with Coastal Buffers