100+ datasets found
  1. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +1more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  2. A

    Public Schools

    • data.boston.gov
    • cloudcity.ogopendata.com
    • +2more
    Updated Dec 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Boston Maps (2023). Public Schools [Dataset]. https://data.boston.gov/dataset/public-schools
    Explore at:
    html, arcgis geoservices rest api, kml, geojson, shp, csvAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset authored and provided by
    Boston Maps
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    Boston Public Schools (BPS) schools for the school year 2018-2019. Updated September 2018.

  3. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    World Bankhttp://topics.nytimes.com/top/reference/timestopics/organizations/w/world_bank/index.html
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  4. d

    US Restaurant POI dataset with metadata

    • datarade.ai
    .csv
    Updated Jul 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geolytica (2022). US Restaurant POI dataset with metadata [Dataset]. https://datarade.ai/data-products/us-restaurant-poi-dataset-with-metadata-geolytica
    Explore at:
    .csvAvailable download formats
    Dataset updated
    Jul 30, 2022
    Dataset authored and provided by
    Geolytica
    Area covered
    United States of America
    Description

    Point of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.

    This is our process flow:

    Our machine learning systems continuously crawl for new POI data
    Our geoparsing and geocoding calculates their geo locations
    Our categorization systems cleanup and standardize the datasets
    Our data pipeline API publishes the datasets on our data store
    

    A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.

    POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.

    We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.

    Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.

    Data samples may be downloaded at https://store.poidata.xyz/us

  5. n

    Jurisdictional Unit (Public) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The

  6. N

    Gratis, OH Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Gratis, OH Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235d8fd-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gratis
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Gratis is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Gratis total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here

  7. m

    Composed Encrypted Malicious Traffic Dataset for machine learning based...

    • data.mendeley.com
    Updated Oct 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zihao Wang (2021). Composed Encrypted Malicious Traffic Dataset for machine learning based encrypted malicious traffic analysis. [Dataset]. http://doi.org/10.17632/ztyk4h3v6s.2
    Explore at:
    Dataset updated
    Oct 12, 2021
    Authors
    Zihao Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a traffic dataset which contains balance size of encrypted malicious and legitimate traffic for encrypted malicious traffic detection. The dataset is a secondary csv feature data which is composed of five public traffic datasets. Our dataset is composed based on three criteria: The first criterion is to combine widely considered public datasets which contain both encrypted malicious and legitimate traffic in existing works, such as the Malwares Capture Facility Project dataset and the CICIDS-2017 dataset. The second criterion is to ensure the data balance, i.e., balance of malicious and legitimate network traffic and similar size of network traffic contributed by each individual dataset. Thus, approximate proportions of malicious and legitimate traffic from each selected public dataset are extracted by using random sampling. We also ensured that there will be no traffic size from one selected public dataset that is much larger than other selected public datasets. The third criterion is that our dataset includes both conventional devices' and IoT devices' encrypted malicious and legitimate traffic, as these devices are increasingly being deployed and are working in the same environments such as offices, homes, and other smart city settings.

    Based on the criteria, 5 public datasets are selected. After data pre-processing, details of each selected public dataset and the final composed dataset are shown in “Dataset Statistic Analysis Document”. The document summarized the malicious and legitimate traffic size we selected from each selected public dataset, proportions of selected traffic size from each selected public dataset with respect to the total traffic size of the composed dataset (% w.r.t the composed dataset), proportions of selected encrypted traffic size from each selected public dataset (% of selected public dataset), and total traffic size of the composed dataset. From the table, we are able to observe that each public dataset equally contributes to approximately 20% of the composed dataset, except for CICDS-2012 (due to its limited number of encrypted malicious traffic). This achieves a balance across individual datasets and reduces bias towards traffic belonging to any dataset during learning. We can also observe that the size of malicious and legitimate traffic are almost the same, thus achieving class balance. The datasets now made available were prepared aiming at encrypted malicious traffic detection. Since the dataset is used for machine learning model training, a sample of train and test sets are also provided. The train and test datasets are separated based on 1:4 and stratification is applied during data split. Such datasets can be used directly for machine or deep learning model training based on selected features.

  8. a

    BLM Natl Public Lands Access Data Line

    • gbp-blm-egis.hub.arcgis.com
    • gimi9.com
    • +2more
    Updated Aug 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bureau of Land Management (2025). BLM Natl Public Lands Access Data Line [Dataset]. https://gbp-blm-egis.hub.arcgis.com/datasets/BLM-EGIS::blm-natl-public-lands-access-data-line
    Explore at:
    Dataset updated
    Aug 1, 2025
    Dataset authored and provided by
    Bureau of Land Management
    Area covered
    Description

    This line feature represents Federal interests in private land, including easements and reservations (1) in which the Federal Government does not have a fee title interest; and (2) that provide both legal public recreational access and legal administrative access to the Federal land in conformance with the MAPLand Act and the PLAD project. This dataset also also provides BLM managers with information to identify access limitations, vulnerabilities, and areas where access to public lands may be improved. The dataset can also support additional management of resources such as timber harvest, travel and transportation planning, wildfire fuels reduction projects and many other land management related decisions.

  9. d

    Job Postings Dataset for Labour Market Research and Insights

    • datarade.ai
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oxylabs (2023). Job Postings Dataset for Labour Market Research and Insights [Dataset]. https://datarade.ai/data-products/job-postings-dataset-for-labour-market-research-and-insights-oxylabs
    Explore at:
    .json, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset authored and provided by
    Oxylabs
    Area covered
    Anguilla, Zambia, Switzerland, Jamaica, Luxembourg, British Indian Ocean Territory, Togo, Tajikistan, Sierra Leone, Kyrgyzstan
    Description

    Introducing Job Posting Datasets: Uncover labor market insights!

    Elevate your recruitment strategies, forecast future labor industry trends, and unearth investment opportunities with Job Posting Datasets.

    Job Posting Datasets Source:

    1. Indeed: Access datasets from Indeed, a leading employment website known for its comprehensive job listings.

    2. Glassdoor: Receive ready-to-use employee reviews, salary ranges, and job openings from Glassdoor.

    3. StackShare: Access StackShare datasets to make data-driven technology decisions.

    Job Posting Datasets provide meticulously acquired and parsed data, freeing you to focus on analysis. You'll receive clean, structured, ready-to-use job posting data, including job titles, company names, seniority levels, industries, locations, salaries, and employment types.

    Choose your preferred dataset delivery options for convenience:

    Receive datasets in various formats, including CSV, JSON, and more. Opt for storage solutions such as AWS S3, Google Cloud Storage, and more. Customize data delivery frequencies, whether one-time or per your agreed schedule.

    Why Choose Oxylabs Job Posting Datasets:

    1. Fresh and accurate data: Access clean and structured job posting datasets collected by our seasoned web scraping professionals, enabling you to dive into analysis.

    2. Time and resource savings: Focus on data analysis and your core business objectives while we efficiently handle the data extraction process cost-effectively.

    3. Customized solutions: Tailor our approach to your business needs, ensuring your goals are met.

    4. Legal compliance: Partner with a trusted leader in ethical data collection. Oxylabs is a founding member of the Ethical Web Data Collection Initiative, aligning with GDPR and CCPA best practices.

    Pricing Options:

    Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.

    Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.

    Experience a seamless journey with Oxylabs:

    • Understanding your data needs: We work closely to understand your business nature and daily operations, defining your unique data requirements.
    • Developing a customized solution: Our experts create a custom framework to extract public data using our in-house web scraping infrastructure.
    • Delivering data sample: We provide a sample for your feedback on data quality and the entire delivery process.
    • Continuous data delivery: We continuously collect public data and deliver custom datasets per the agreed frequency.

    Effortlessly access fresh job posting data with Oxylabs Job Posting Datasets.

  10. d

    NYC Parks Structures

    • catalog.data.gov
    • data.cityofnewyork.us
    • +1more
    Updated Aug 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). NYC Parks Structures [Dataset]. https://catalog.data.gov/dataset/nyc-parks-structures
    Explore at:
    Dataset updated
    Aug 30, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    This dataset includes structures within NYC Parks properties. Structures are broadly defined as "an assembly of materials forming construction for occupancy or use." One line of data is a structure. The dataset contains fields that are maintained by multiple agencies including NYC Parks, NYC DoITT, and NYC Planning. Where possible, updated values are pulled from authoritative sources and updated weekly - for more details about specific fields and where they come from please see https://github.com/NYCParks-data/Structures/wiki The System ID and BIN (Building Identification Number) are both required fields. A known limitation to this dataset is that functions other than 'public restroom' and 'recreation center' can and should be attributed to many of the structures. This information will eventually live and be maintained in a related table where all the functions of individual structures can be seen. Data Dictionary here: https://docs.google.com/spreadsheets/d/17ptFZkuhrquuvSfEb2dum3Q6jNbVT98WohR-pl646o4/edit?usp=sharing

  11. u

    Replication Data for: "Public Support for Gay Rights Across Countries and...

    • iro.uiowa.edu
    Updated May 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Byung-Deuk Woo; Hyein Ko; Yuehong Cassandra Tai; Yue Hu; Frederick Solt (2025). Replication Data for: "Public Support for Gay Rights Across Countries and Over Time." Social Science Quarterly [Dataset]. https://iro.uiowa.edu/esploro/outputs/dataset/Replication-Data-for-Public-Support-for/9984824323302771
    Explore at:
    Dataset updated
    May 30, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Byung-Deuk Woo; Hyein Ko; Yuehong Cassandra Tai; Yue Hu; Frederick Solt
    Time period covered
    2024
    Description

    Objective. Support for gay rights has increased in the publics of many countries over recent decades, but the scholarship on the topic has been hindered by the limited available data on these trends in public opinion. The goal of the Support for Gay Rights (SGR) dataset is to overcome this problem. Method. The SGR dataset is constructed by combining a comprehensive collection of survey data with a latent-variable model to provide annual time-series estimates of public support for gay rights across 118 countries and over as many as 51 years that are comparable across space and time. Results. We show these data perform well in validation tests and demonstrate their potential by replicating the influential but recently questioned finding of Andersen and Fetner (2008) that more income inequality yields less tolerant and supportive attitudes toward gay people. Conclusion. We anticipate that the SGR data will become a crucial source for cross-national, cross-regional, and longitudinal research that improves our understanding of the sources and consequences of public support for gay rights.

  12. d

    Educational Attainment

    • catalog.data.gov
    • data.chhs.ca.gov
    • +4more
    Updated Jul 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Educational Attainment [Dataset]. https://catalog.data.gov/dataset/educational-attainment-8c8b5
    Explore at:
    Dataset updated
    Jul 23, 2025
    Dataset provided by
    California Department of Public Health
    Description

    This table contains data on the percent of population age 25 and up with a four-year college degree or higher for California, its regions, counties, county subdivisions, cities, towns, and census tracts. Greater educational attainment has been associated with health-promoting behaviors including consumption of fruits and vegetables and other aspects of healthy eating, engaging in regular physical activity, and refraining from excessive consumption of alcohol and from smoking. Completion of formal education (e.g., high school) is a key pathway to employment and access to healthier and higher paying jobs that can provide food, housing, transportation, health insurance, and other basic necessities for a healthy life. Education is linked with social and psychological factors, including sense of control, social standing and social support. These factors can improve health through reducing stress, influencing health-related behaviors and providing practical and emotional support. More information on the data table and a data dictionary can be found in the Data and Resources section. The educational attainment table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. More information on HCI can be found here: https://www.cdph.ca.gov/Programs/OHE/CDPH%20Document%20Library/Accessible%202%20CDPH_Healthy_Community_Indicators1pager5-16-12.pdf The format of the educational attainment table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.

  13. v

    Louisville Metro KY - Annual Open Data Report 2022

    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • data.lojic.org
    • +3more
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2025). Louisville Metro KY - Annual Open Data Report 2022 [Dataset]. https://res1catalogd-o-tdatad-o-tgov.vcapture.xyz/dataset/louisville-metro-ky-annual-open-data-report-2022
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Louisville, Kentucky
    Description

    On August 25th, 2022, Metro Council Passed Open Data Ordinance; previously open data reports were published on Mayor Fischer's Executive Order, You can find here both the Open Data Ordinance, 2022 (PDF) and the Mayor's Open Data Executive Order, 2013 Open Data Annual ReportsPage 6 of the Open Data Ordinance, Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council an annual Open Data Report.The Open Data Management team (also known as the Data Governance Team is currently led by the city's Data Officer Andrew McKinney in the Office of Civic Innovation and Technology. Previously, it was led by the former Data Officer, Michael Schnuerle and prior to that by Director of IT.Open Data Ordinance O-243-22 TextLouisville Metro GovernmentLegislation TextFile #: O-243-22, Version: 3ORDINANCE NO._, SERIES 2022AN ORDINANCE CREATING A NEW CHAPTER OF THE LOUISVILLE/JEFFERSONCOUNTY METRO CODE OF ORDINANCES CREATING AN OPEN DATA POLICYAND REVIEW. (AMENDMENT BY SUBSTITUTION)(AS AMENDED).SPONSORED BY: COUNCIL MEMBERS ARTHUR, WINKLER, CHAMBERS ARMSTRONG,PIAGENTINI, DORSEY, AND PRESIDENT JAMESWHEREAS, Metro Government is the catalyst for creating a world-class city that provides itscitizens with safe and vibrant neighborhoods, great jobs, a strong system of education and innovationand a high quality of life;WHEREAS, it should be easy to do business with Metro Government. Online governmentinteractions mean more convenient services for citizens and businesses and online governmentinteractions improve the cost effectiveness and accuracy of government operations;WHEREAS, an open government also makes certain that every aspect of the builtenvironment also has reliable digital descriptions available to citizens and entrepreneurs for deepengagement mediated by smart devices;WHEREAS, every citizen has the right to prompt, efficient service from Metro Government;WHEREAS, the adoption of open standards improves transparency, access to publicinformation and improved coordination and efficiencies among Departments and partnerorganizations across the public, non-profit and private sectors;WHEREAS, by publishing structured standardized data in machine readable formats, MetroGovernment seeks to encourage the local technology community to develop software applicationsand tools to display, organize, analyze, and share public record data in new and innovative ways;WHEREAS, Metro Government’s ability to review data and datasets will facilitate a betterUnderstanding of the obstacles the city faces with regard to equity;WHEREAS, Metro Government’s understanding of inequities, through data and datasets, willassist in creating better policies to tackle inequities in the city;WHEREAS, through this Ordinance, Metro Government desires to maintain its continuousimprovement in open data and transparency that it initiated via Mayoral Executive Order No. 1,Series 2013;WHEREAS, Metro Government’s open data work has repeatedly been recognized asevidenced by its achieving What Works Cities Silver (2018), Gold (2019), and Platinum (2020)certifications. What Works Cities recognizes and celebrates local governments for their exceptionaluse of data to inform policy and funding decisions, improve services, create operational efficiencies,and engage residents. The Certification program assesses cities on their data-driven decisionmakingpractices, such as whether they are using data to set goals and track progress, allocatefunding, evaluate the effectiveness of programs, and achieve desired outcomes. These datainformedstrategies enable Certified Cities to be more resilient, respond in crisis situations, increaseeconomic mobility, protect public health, and increase resident satisfaction; andWHEREAS, in commitment to the spirit of Open Government, Metro Government will considerpublic information to be open by default and will proactively publish data and data containinginformation, consistent with the Kentucky Open Meetings and Open Records Act.NOW, THEREFORE, BE IT ORDAINED BY THE COUNCIL OF THELOUISVILLE/JEFFERSON COUNTY METRO GOVERNMENT AS FOLLOWS:SECTION I: A new chapter of the Louisville Metro Code of Ordinances (“LMCO”) mandatingan Open Data Policy and review process is hereby created as follows:§ XXX.01 DEFINITIONS. For the purpose of this Chapter, the following definitions shall apply unlessthe context clearly indicates or requires a different meaning.OPEN DATA. Any public record as defined by the Kentucky Open Records Act, which could bemade available online using Open Format data, as well as best practice Open Data structures andformats when possible, that is not Protected Information or Sensitive Information, with no legalrestrictions on use or reuse. Open Data is not information that is treated as exempt under KRS61.878 by Metro Government.OPEN DATA REPORT. The annual report of the Open Data Management Team, which shall (i)summarize and comment on the state of Open Data availability in Metro Government Departmentsfrom the previous year, including, but not limited to, the progress toward achieving the goals of MetroGovernment’s Open Data portal, an assessment of the current scope of compliance, a list of datasetscurrently available on the Open Data portal and a description and publication timeline for datasetsenvisioned to be published on the portal in the following year; and (ii) provide a plan for the next yearto improve online public access to Open Data and maintain data quality.OPEN DATA MANAGEMENT TEAM. A group consisting of representatives from each Departmentwithin Metro Government and chaired by the Data Officer who is responsible for coordinatingimplementation of an Open Data Policy and creating the Open Data Report.DATA COORDINATORS. The members of an Open Data Management Team facilitated by theData Officer and the Office of Civic Innovation and Technology.DEPARTMENT. Any Metro Government department, office, administrative unit, commission, board,advisory committee, or other division of Metro Government.DATA OFFICER. The staff person designated by the city to coordinate and implement the city’sopen data program and policy.DATA. The statistical, factual, quantitative or qualitative information that is maintained or created byor on behalf of Metro Government.DATASET. A named collection of related records, with the collection containing data organized orformatted in a specific or prescribed way.METADATA. Contextual information that makes the Open Data easier to understand and use.OPEN DATA PORTAL. The internet site established and maintained by or on behalf of MetroGovernment located at https://res1datad-o-tlouisvillekyd-o-tgov.vcapture.xyz/ or its successor website.OPEN FORMAT. Any widely accepted, nonproprietary, searchable, platform-independent, machinereadablemethod for formatting data which permits automated processes.PROTECTED INFORMATION. Any Dataset or portion thereof to which the Department may denyaccess pursuant to any law, rule or regulation.SENSITIVE INFORMATION. Any Data which, if published on the Open Data Portal, could raiseprivacy, confidentiality or security concerns or have the potential to jeopardize public health, safety orwelfare to an extent that is greater than the potential public benefit of publishing that data.§ XXX.02 OPEN DATA PORTAL(A) The Open Data Portal shall serve as the authoritative source for Open Data provided by MetroGovernment.(B) Any Open Data made accessible on Metro Government’s Open Data Portal shall use an OpenFormat.(C) In the event a successor website is used, the Data Officer shall notify the Metro Council andshall provide notice to the public on the main city website.§ XXX.03 OPEN DATA MANAGEMENT TEAM(A) The Data Officer of Metro Government will work with the head of each Department to identify aData Coordinator in each Department. The Open Data Management Team will work to establish arobust, nationally recognized, platform that addresses digital infrastructure and Open Data.(B) The Open Data Management Team will develop an Open Data Policy that will adopt prevailingOpen Format standards for Open Data and develop agreements with regional partners to publish andmaintain Open Data that is open and freely available while respecting exemptions allowed by theKentucky Open Records Act or other federal or state law.§ XXX.04 DEPARTMENT OPEN DATA CATALOGUE(A) Each Department shall retain ownership over the Datasets they submit to the Open DataPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityPortal. The Departments shall also be responsible for all aspects of the quality, integrity and securityof the Dataset contents, including updating its Data and associated Metadata.(B) Each Department shall be responsible for creating an Open Data catalogue which shall includecomprehensive inventories of information possessed and/or managed by the Department.(C) Each Department’s Open Data catalogue will classify information holdings as currently “public”or “not yet public;” Departments will work with the Office of Civic Innovation and Technology todevelop strategies and timelines for publishing Open Data containing information in a way that iscomplete, reliable and has a high level of detail.§ XXX.05 OPEN DATA REPORT AND POLICY REVIEW(A) Within one year of the effective date of this Ordinance, and thereafter no later than September1 of each year, the Open Data Management Team shall submit to the Mayor and Metro Council anannual Open Data Report.(B) Metro Council may request a specific Department to report on any data or dataset that may bebeneficial or pertinent in implementing policy and legislation.(C) In acknowledgment that technology changes rapidly, in the future, the Open Data Policy shouldshall be reviewed annually and considered for revisions or additions that will continue to positionMetro Government

  14. D

    San Francisco Department of Public Health Substance Use Services

    • data.sfgov.org
    • healthdata.gov
    • +1more
    csv, xlsx, xml
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). San Francisco Department of Public Health Substance Use Services [Dataset]. https://data.sfgov.org/Health-and-Social-Services/San-Francisco-Department-of-Public-Health-Substanc/ubf6-e57x
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    Aug 20, 2025
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Area covered
    San Francisco
    Description

    A. SUMMARY This dataset includes data on a variety of substance use services funded by the San Francisco Department of Public Health (SFDPH). This dataset only includes Drug MediCal-certified residential treatment, withdrawal management, and methadone treatment. Other private non-Drug Medi-Cal treatment providers may operate in the city. Withdrawal management discharges are inclusive of anyone who left withdrawal management after admission and may include someone who left before completing withdrawal management.

    This dataset also includes naloxone distribution from the SFDPH Behavioral Health Services Naloxone Clearinghouse and the SFDPH-funded Drug Overdose Prevention and Education program. Both programs distribute naloxone to various community-based organizations who then distribute naloxone to their program participants. Programs may also receive naloxone from other sources. Data from these other sources is not included in this dataset.

    Finally, this dataset includes the number of clients on medications for opioid use disorder (MOUD).

    The number of people who were treated with methadone at a Drug Medi-Cal certified Opioid Treatment Program (OTP) by year is populated by the San Francisco Department of Public Health (SFDPH) Behavioral Health Services Quality Management (BHSQM) program. OTPs in San Francisco are required to submit patient billing data in an electronic medical record system called Avatar. BHSQM calculates the number of people who received methadone annually based on Avatar data. Data only from Drug MediCal certified OTPs were included in this dataset.

    The number of people who receive buprenorphine by year is populated from the Controlled Substance Utilization Review and Evaluation System (CURES), administered by the California Department of Justice. All licensed prescribers in California are required to document controlled substance prescriptions in CURES. The Center on Substance Use and Health calculates the total number of people who received a buprenorphine prescription annually based on CURES data. Formulations of buprenorphine that are prescribed only for pain management are excluded.

    People may receive buprenorphine and methadone in the same year, so you cannot add the Buprenorphine Clients by Year, and Methadone Clients by Year data together to get the total number of unique people receiving medications for opioid use disorder.

    For more information on where to find treatment in San Francisco, visit findtreatment-sf.org. 

    B. HOW THE DATASET IS CREATED This dataset is created by copying the data into this dataset from the SFDPH Behavioral Health Services Quality Management Program, the California Controlled Substance Utilization Review and Evaluation System (CURES), and the Office of Overdose Prevention.

    C. UPDATE PROCESS Residential Substance Use Treatment, Withdrawal Management, Methadone, and Naloxone data are updated quarterly with a 45-day delay. Buprenorphine data are updated quarterly and when the state makes this data available, usually at a 5-month delay.

    D. HOW TO USE THIS DATASET Throughout the year this dataset may include partial year data for methadone and buprenorphine treatment. As both methadone and buprenorphine are used as long-term treatments for opioid use disorder, many people on treatment at the end of one calendar year will continue into the next. For this reason, doubling (methadone), or quadrupling (buprenorphine) partial year data will not accurately project year-end totals.

    E. RELATED DATASETS Overdose-Related 911 Responses by Emergency Medical Services Unintentional Overdose Death Rates by Race/Ethnicity Preliminary Unintentional Drug Overdose Deaths

  15. Google Capstone Project - BellaBeats

    • kaggle.com
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Porzelius (2023). Google Capstone Project - BellaBeats [Dataset]. https://www.kaggle.com/datasets/jasonporzelius/google-capstone-project-bellabeats
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 5, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jason Porzelius
    Description

    Introduction: I have chosen to complete a data analysis project for the second course option, Bellabeats, Inc., using a locally hosted database program, Excel for both my data analysis and visualizations. This choice was made primarily because I live in a remote area and have limited bandwidth and inconsistent internet access. Therefore, completing a capstone project using web-based programs such as R Studio, SQL Workbench, or Google Sheets was not a feasible choice. I was further limited in which option to choose as the datasets for the ride-share project option were larger than my version of Excel would accept. In the scenario provided, I will be acting as a Junior Data Analyst in support of the Bellabeats, Inc. executive team and data analytics team. This combined team has decided to use an existing public dataset in hopes that the findings from that dataset might reveal insights which will assist in Bellabeat's marketing strategies for future growth. My task is to provide data driven insights to business tasks provided by the Bellabeats, Inc.'s executive and data analysis team. In order to accomplish this task, I will complete all parts of the Data Analysis Process (Ask, Prepare, Process, Analyze, Share, Act). In addition, I will break each part of the Data Analysis Process down into three sections to provide clarity and accountability. Those three sections are: Guiding Questions, Key Tasks, and Deliverables. For the sake of space and to avoid repetition, I will record the deliverables for each Key Task directly under the numbered Key Task using an asterisk (*) as an identifier.

    Section 1 - Ask: A. Guiding Questions: Who are the key stakeholders and what are their goals for the data analysis project? What is the business task that this data analysis project is attempting to solve?

    B. Key Tasks: Identify key stakeholders and their goals for the data analysis project *The key stakeholders for this project are as follows: -Urška Sršen and Sando Mur - co-founders of Bellabeats, Inc. -Bellabeats marketing analytics team. I am a member of this team. Identify the business task. *The business task is: -As provided by co-founder Urška Sršen, the business task for this project is to gain insight into how consumers are using their non-BellaBeats smart devices in order to guide upcoming marketing strategies for the company which will help drive future growth. Specifically, the researcher was tasked with applying insights driven by the data analysis process to 1 BellaBeats product and presenting those insights to BellaBeats stakeholders.

    Section 2 - Prepare: A. Guiding Questions: Where is the data stored and organized? Are there any problems with the data? How does the data help answer the business question?

    B. Key Tasks: Research and communicate the source of the data, and how it is stored/organized to stakeholders. *The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through user Mobius in an open-source format. Therefore, the data is public and available to be copied, modified, and distributed, all without asking the user for permission. These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk reportedly (see credibility section directly below) between 03/12/2016 thru 05/12/2016. *Reportedly (see credibility section directly below), thirty eligible Fitbit users consented to the submission of personal tracker data, including output related to steps taken, calories burned, time spent sleeping, heart rate, and distance traveled. This data was broken down into minute, hour, and day level totals. This data is stored in 18 CSV documents. I downloaded all 18 documents into my local laptop and decided to use 2 documents for the purposes of this project as they were files which had merged activity and sleep data from the other documents. All unused documents were permanently deleted from the laptop. The 2 files used were: -sleepDaymerged.csv -dailyActivitymerged.csv Identify and communicate to stakeholders any problems found with the data related to credibility and bias. *As will be more specifically presented in the Process section, the data seems to have credibility issues related to the reported time frame of the data collected. The metadata seems to indicate that the data collected covered roughly 2 months of FitBit tracking. However, upon my initial data processing, I found that only 1 month of data was reported. *As will be more specifically presented in the Process section, the data has credibility issues related to the number of individuals who reported FitBit data. Specifically, the metadata communicates that 30 individual users agreed to report their tracking data. My initial data processing uncovered 33 individual IDs in the dailyActivity_merged dataset. *Due to the small number of participants (...

  16. a

    Juvenile Facilities Public Dataset Q1 2020

    • hub.arcgis.com
    Updated Jun 1, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AlamedaCounty.CA.US (2020). Juvenile Facilities Public Dataset Q1 2020 [Dataset]. https://hub.arcgis.com/datasets/dedce3c7d094458ab85f6dfc45228f5d
    Explore at:
    Dataset updated
    Jun 1, 2020
    Dataset provided by
    AlamedaCounty.CA.US
    Description

    Youth in custody and in alternatives to detention supervised by the Juvenile Facilities Division of the Alameda County Probation Department.Dataset Overview and SourceThe Alameda County Probation Department (ACPD) collects information on all juveniles referred to the department and records this in PRISM (Probation Record Information System Management). Several reports are built into the PRISM system that allow ACPD staff to extract data for specified time periods. This dataset is derived from the PRISM PO-170 report – the Juvenile Detentions Data Extract. This report contains individual-level information for all youth detained on the date specified. To create this dataset, the report was de-identified and edited to include the most pertinent information.Data Characteristics and Known Limitations• To ensure confidentiality and to protect the identities of individuals on probation, the ages of some individuals in the dataset have been withheld, and marked “N/A”. This is done to avoid possible re-identification through the available demographic information, or stigmatization of a group when they make a substantial percent of a designated population. Please see the “Notes” tab in the public dataset file, which states the number of records with withheld ages in that dataset.• This data contains all youth who were detained on the date specified. Many youth are booked into Juvenile Hall when arrested and subsequently released without ever spending a night in detention. These youth will be included in this dataset if they were detained at the time of the report, regardless of how long they stayed in Juvenile Hall.View Data Dictionary

  17. CURVAS-PDACVI dataset

    • zenodo.org
    • explore.openaire.eu
    zip
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meritxell Riera-Marín; Meritxell Riera-Marín; SIKHA O K; SIKHA O K; MARIA MONTSERRAT DUH; MARIA MONTSERRAT DUH; Anton Aubanell; Anton Aubanell; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia; Júlia Rodríguez-Comas; Júlia Rodríguez-Comas; Miguel Ángel González Ballester; Miguel Ángel González Ballester; Javier Garcia López; Javier Garcia López; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia (2025). CURVAS-PDACVI dataset [Dataset]. http://doi.org/10.5281/zenodo.15401568
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Meritxell Riera-Marín; Meritxell Riera-Marín; SIKHA O K; SIKHA O K; MARIA MONTSERRAT DUH; MARIA MONTSERRAT DUH; Anton Aubanell; Anton Aubanell; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia; Júlia Rodríguez-Comas; Júlia Rodríguez-Comas; Miguel Ángel González Ballester; Miguel Ángel González Ballester; Javier Garcia López; Javier Garcia López; de Figueiredo Cardoso Ruben; Egger-Hackenschmidt Saskia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This challenge will be hosted soon in Grand Challenge. Currently under construction.

    Clinical Problem

    In medical imaging, DL models are often tasked with delineating structures or abnormalities within complex anatomical structures, such as tumors, blood vessels, or organs. Uncertainty arises from the inherent complexity and variability of these structures, leading to challenges in precisely defining their boundaries. This uncertainty is further compounded by interrater variability, as different medical experts may have varying opinions on where the true boundaries lie. DL models must grapple with these discrepancies, leading to inconsistencies in segmentation results across different annotators and potentially impacting diagnosis and treatment decisions. Addressing interrater variability in DL for medical segmentation involves the development of robust algorithms capable of capturing and quantifying uncertainty, as well as standardizing annotation practices and promoting collaboration among medical experts to reduce variability and improve the reliability of DL-based medical image analysis. Interrater variability poses significant challenges in the field of DL for medical image segmentation.

    This challenge is designed to promote awareness of the impact uncertainty has on clinical applications of medical image analysis. In our last-year edition, we proposed a competition based on modeling the uncertainty of segmenting three abdominal organs, namely kidney, liver and pancreas, focusing on organ volume as a clinical quantity of interest. This year, we go one step further and propose to segment pancreatic pathological structures, namely Pancreatic Ductal Adenocarcinoma (PDAC), with the clinical goal of understanding vascular involvement, a key measure of tumor resectability. In this above context, uncertainty quantification is a much more challenging task, given the wildly varying contours that different PDAC instances show.

    This year, we will provide a richer dataset, in which we start from an already existing dataset of clinically verified contrast-enhanced abdominal CT scans with a single set of manual annotations (provided by the PANORAMA organization), and make an effort to construct four extra manual annotations per PDAC case. In this way, we will assemble a unique dataset that creates a notable opportunity to analyze the impact of multi-rater annotations in several dimensions, e.g. different annotation protocols or different annotator experiences, to name a few.

    CURVAS Challenge Goal

    This challenge aims to advance deep learning methods for medical image segmentation by focusing on the critical issue of interrater variability, particularly in the context of pancreatic cancer. Building on last year's focus on organ segmentation uncertainty, this edition shifts to the more complex task of segmenting Pancreatic Ductal Adenocarcinoma (PDAC) to assess vascular involvement—a key indicator of tumor resectability. By providing a unique, richly annotated dataset with multiple expert annotations per case, the challenge encourages participants to develop robust models that can quantify and manage uncertainty arising from differing expert opinions, ultimately improving the clinical reliability of AI-based image analysis.

    For more information about the challenge, visit our website to join CURVAS-PDACVI (Calibration and Uncertainty for multiRater Volume Assessment in multistructure Segmentation - Pancreatic Ductal AdenoCarcinoma Vascular Invasion). This challenge will be held in MICCAI 2025.

    Dataset Cohort

    The challenge cohort comprises upper-abdominal axial, portal-venous CECT 125 CT scans selected from a subset of the PANORAMA challenge dataset. The selection process will prioritize CT scans with manually generated labels, excluding those with automatically derived annotations. Additionally, only cases with a conclusive diagnostic test (e.g., pathology, cytology, histopathology) are included, while patients with radiology-based diagnoses have been excluded.

    To ensure the subset is representative of common real-world scenarios, lesion sizes have been analyzed, and a diverse range of cases have been selected. Furthermore, patient demographics, including sex and age, have been considered to enhance the cohort's representativeness.

    Finally, a preliminary visual analysis have been conducted before sending the image to radiologists for segmentation. This ensures the tumor's location, size, and relevance, helping maintain the dataset's representativeness for the challenge.

    The previously indicated cohort of 125 CT scans is splitted in the following way:

    • Training Phase cohort:

    40 CT scans with the respective annotations is given. It is encouraged to leverage publicly available external data annotated by multiple raters. The idea of giving a small amount of data for the training set and giving the opportunity of using a public dataset for training is to make the challenge more inclusive, giving the option to develop a method by using data that is in anyone's hands. Furthermore, by using this data to train and using other data to evaluate, it makes it more robust to shifts and other sources of variability between datasets.

    • Validation Phase cohort:

    5 CT scans will be used for this phase.

    • Test Phase cohort:

    85 CT scans will be used for evaluation.

    Both validation and testing CT scans cohorts will not be published until the end of the challenge. Furthermore, to which group each CT scan belongs will not be revealed until after the challenge.

    Each folder containing a study is named with a unique ID (CURVASPDAC_XXXX) so it cannot be directy related to the PANORAMA ID and has the following structure:

    • annotation_X.nii.gz: contains the Pancreatic Ductal Adenocarcinoma (PDAC) segmentations (X=1 being the PANORAMA segmentation, X=2,..,5 being the other experts segmentations)
    • image.nii.gz: CT volume

    The four additional annotations are done from radiologists at Universitätsklinikum Erlangen, Hospital de Sant Pau, and Hospital de Mataró. Hence, four new annotations plus the PANORAMA annotation are provied. Another clinician, focused on modifying the annotations from the vascular structures of the PANORAMA dataset and separated veins and arteries in single strcutures segmentations. This structures are the ones considered highly relevant for the study of Vascular Invasion (VI): Porta, Superior Mesenteric Vein (SMV), Superior Mesenteric Artery (SMA), Hepatic Artery and Celiac Trunk. The vascular annotations will be made public later in the challenge, so the participants can try out the evaluation code.

    A balance to ensure representiveness within the subsets have been performed as well. Factors such as devices, sex, and patient age have been considered to improve the cohort's representativeness. Efforts have been made to balance bias as evenly as possible across these variables. For age distribution, the target percentages are as follows: below 50 years (5%), 50–59 years (15%), 60–69 years (20%), 70–79 years (30%), and 80–89 years (30%) [1,2,3,4]. While these percentages are approximate and have been rounded for simplicity, the balance aims to be as close to these proportions as feasible. For the sex, 40-50% for females and 50-60% for males [5]. For location of the PDAC, 60-70% head, 15-25% body and 10-15% tail [6]. The size of the lesions has been analyzed and a subset will be selected and this values will be published in the future with the entire dataset.

    Data from PANORAMA Batch 1 (https://zenodo.org/records/13715870), Batch 2 (https://zenodo.org/records/13742336), and Batch 3 (https://zenodo.org/records/11034011)), are not allowed for training the models. Batch 4 (https://zenodo.org/records/10999754) can be used.

    For more technical information about the dataset visit the platform: https://panorama.grand-challenge.org/datasets-imaging-labels/

    Ethical Approval and Data Usage Agreement

    No other information that is not already public about the patient will be released since the CT images and their corresponding information are already publicly available.

    References

    [1] Lee, K.S.; Sekhar, A.; Rofsky, N.M.; Pedrosa, I. Prevalence of Incidental Pancreatic Cysts in the Adult Population on MR Imaging. Am J Gastroenterol 2010, 105, 2079–2084, doi:10.1038/ajg.2010.122.

    [2] Canakis, A.; Lee, L.S. State-of-the-Art Update of Pancreatic Cysts. Dig Dis Sci 2021.

    [3] De Oliveira, P.B.; Puchnick, A.; Szejnfeld, J.; Goldman, S.M. Prevalence of Incidental Pancreatic Cysts on 3 Tesla Magnetic Resonance. PLoS One 2015, 10, doi:10.1371/JOURNAL.PONE.0121317.

    [4] Kimura, W.; Nagai, H.; Kuroda, A.; Muto, T.; Esaki, Y. Analysis of Small Cystic Lesions of the Pancreas. Int J Pancreatol 1995, 18, 197–206, doi:10.1007/BF02784942.

    [5] Natalie Moshayedi et al. Race, sex, age, and geographic disparities in pancreatic cancer incidence. JCO 40, 520-520(2022). DOI:10.1200/JCO.2022.40.4_suppl.520

    [6] Avo Artinyan, Perry A. Soriano, Christina Prendergast, Tracey Low, Joshua D.I. Ellenhorn, Joseph Kim, The anatomic location of pancreatic cancer is a prognostic

  18. h

    Public Health Research Database (PHRD)

    • healthdatagateway.org
    unknown
    Updated Apr 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2021). Public Health Research Database (PHRD) [Dataset]. https://healthdatagateway.org/dataset/403
    Explore at:
    unknownAvailable download formats
    Dataset updated
    Apr 21, 2021
    Dataset authored and provided by
    Office for National Statistics
    License

    https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherschemehttps://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/approvedresearcherscheme

    Description

    The Public Health Research Database (PHRD) is a linked asset which currently includes Census 2011 data; Mortality Data; Hospital Episode Statistics (HES); GP Extraction Service (GPES) Data for Pandemic Planning and Research data. Researchers may apply for these datasets individually or any combination of the current 4 datasets.

    The purpose of this dataset is to enable analysis of deaths involving COVID-19 by multiple factors such as ethnicity, religion, disability and known comorbidities as well as age, sex, socioeconomic and marital status at subnational levels. 2011 Census data for usual residents of England and Wales, who were not known to have died by 1 January 2020, linked to death registrations for deaths registered between 1 January 2020 and 8 March 2021 on NHS number. The data exclude individuals who entered the UK in the year before the Census took place (due to their high propensity to have left the UK prior to the study period), and those over 100 years of age at the time of the Census, even if their death was not linked. The dataset contains all individuals who died (any cause) during the study period, and a 5% simple random sample of those still alive at the end of the study period. For usual residents of England, the dataset also contains comorbidity flags derived from linked Hospital Episode Statistics data from April 2017 to December 2019 and GP Extraction Service Data from 2015-2019.

  19. Data from: An Open-set Recognition and Few-Shot Learning Dataset for Audio...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated May 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier Naranjo-Alcazar; Sergi Perez-Castanos; Pedro Zuccarello; Maximo Cobos; Javier Naranjo-Alcazar; Sergi Perez-Castanos; Pedro Zuccarello; Maximo Cobos (2024). An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments [Dataset]. http://doi.org/10.5281/zenodo.3689288
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 21, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Javier Naranjo-Alcazar; Sergi Perez-Castanos; Pedro Zuccarello; Maximo Cobos; Javier Naranjo-Alcazar; Sergi Perez-Castanos; Pedro Zuccarello; Maximo Cobos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The problem of training a deep neural network with a small set of positive samples is known as few-shot learning (FSL). It is widely known that traditional deep learning (DL) algorithms usually show very good performance when trained with large datasets. However, in many applications, it is not possible to obtain such a high number of samples. In the image domain, typical FSL applications are those related to face recognition. In the audio domain, music fraud or speaker recognition can be clearly benefited from FSL methods. This paper deals with the application of FSL to the detection of specific and intentional acoustic events given by different types of sound alarms, such as door bells or fire alarms, using a limited number of samples. These sounds typically occur in domestic environments where many events corresponding to a wide variety of sound classes take place. Therefore, the detection of such alarms in a practical scenario can be considered an open-set recognition (OSR) problem. To address the lack of a dedicated public dataset for audio FSL, researchers usually make modifications on other available datasets. This paper is aimed at providing the audio recognition community with a carefully annotated dataset for FSL and OSR comprised of 1360 clips from 34 classes divided into pattern sounds and unwanted sounds. To facilitate and promote research in this area, results with two baseline systems (one trained from scratch and another based on transfer learning), are presented.

  20. a

    EPBC Referrals (Public dataset)

    • hub.arcgis.com
    • fed.dcceew.gov.au
    • +1more
    Updated Sep 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dept of Climate Change, Energy, the Environment & Water (2021). EPBC Referrals (Public dataset) [Dataset]. https://hub.arcgis.com/datasets/ee02ed7773d44c6fa799bf558c70f81a
    Explore at:
    Dataset updated
    Sep 8, 2021
    Dataset authored and provided by
    Dept of Climate Change, Energy, the Environment & Water
    Area covered
    Description

    The Referrals Spatial Database - Public records locations of referrals submitted to the Department under the Environment Protection and Biodiversity Conservation (EPBC Act) 1999. A proponent (those who are proposing a development) must supply the maximum extent (location) of any proposed activities that need to be assessed under the EPBC Act through an application process.Referral boundaries should not be misinterpreted as development footprints but where referrals have been received by the Department. It should be noted that not all referrals captured within the Referrals Spatial Database, are assessed and approved by the Minister for the Environment, as some are withdrawn before assessment can take place. For more detailed information on a referral a URL is provided to the EPBC Act Public notices pages. Status and detailed planning documentation is available on the EPBC Act Public notice database.Post September 2019, this dataset is updated using a spatial data capture tool embedded within the Referral form on the department’s website. Users are able to supply spatial data in multiple formats, review spatial data online and submitted with the completed referral form automatically. Nightly processes update this dataset that are then available for internal staff to use (usually within 24 hours).Prior to September 2019, a manual process was employed to update this dataset. In the first instance where a proponent provides GIS data, this is loaded as the polygons for a referral. Where this doesn't exist other means to digitize boundaries are employed to provide a relatively accurate reflection of the maximum extent for which the referral may impact (it is not a development footprint). This sometimes takes the form of heads up digitizing planning documents, sourcing from other state databases (such as PSMA Australia) features and coordinates supplied through the application forms.Any variations to boundaries after the initial referral (i.e. during the assessment, approval or post-approval stages) are processed on an ad hoc basis through a manual update to the dataset. For more information about referrals please visit: Referrals under the EPBC Act - DAWE

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/

Geonames - All Cities with a population > 1000

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

Search
Clear search
Close search
Google apps
Main menu