51 datasets found
  1. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  2. N

    Big Lake, TX Age Group Population Dataset: A Complete Breakdown of Big Lake...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Big Lake, TX Age Group Population Dataset: A Complete Breakdown of Big Lake Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/451118a9-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Big Lake, Texas
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Big Lake population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Big Lake. The dataset can be utilized to understand the population distribution of Big Lake by age. For example, using this dataset, we can identify the largest age group in Big Lake.

    Key observations

    The largest age group in Big Lake, TX was for the group of age Under 5 years years with a population of 346 (11.19%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Big Lake, TX was the 75 to 79 years years with a population of 17 (0.55%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Big Lake is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Big Lake total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Lake Population by Age. You can refer the same here

  3. N

    Big Stone City, SD Age Group Population Dataset: A Complete Breakdown of Big...

    • neilsberg.com
    csv, json
    Updated Feb 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Big Stone City, SD Age Group Population Dataset: A Complete Breakdown of Big Stone City Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/45111e75-f122-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 22, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Big Stone City, South Dakota
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Big Stone City population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Big Stone City. The dataset can be utilized to understand the population distribution of Big Stone City by age. For example, using this dataset, we can identify the largest age group in Big Stone City.

    Key observations

    The largest age group in Big Stone City, SD was for the group of age 75 to 79 years years with a population of 115 (18.88%), according to the ACS 2019-2023 5-Year Estimates. At the same time, the smallest age group in Big Stone City, SD was the 5 to 9 years years with a population of 3 (0.49%). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Big Stone City is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Big Stone City total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Stone City Population by Age. You can refer the same here

  4. u

    Data from: CLIVAR LE project

    • rda.ucar.edu
    • data.ucar.edu
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CLIVAR LE project [Dataset]. https://rda.ucar.edu/lookfordata/datasets/?nb=y&b=topic&v=Atmosphere
    Explore at:
    Description

    The CLIVAR Large Ensemble repository was built at NCAR and supported by the US CLIVAR WG on Large Ensembles. It features a set of CMORized variables from the following CMIP5 ... class Large Ensembles: CANESM2, CESM, CSIRO MK36, EC Earth, GFDL CM3, GFDL ESM2M, MPI, and OLENS McKinnon.

  5. Z

    Data from: Large Landing Trajectory Data Set for Go-Around Analysis

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Dec 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothé Krauth (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Raphael Monstein
    Marcel Dettling
    Timothé Krauth
    Benoit Figuet
    Manuel Waltert
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

    If you use this data for a scientific publication, please consider citing our paper.

    The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

    go_arounds_minimal.csv.gz

    Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    

    The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

    go_arounds_augmented.csv.gz

    Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    
    
        registration
        string
        Aircraft registration
    
    
        typecode
        string
        Aircraft ICAO typecode
    
    
        icaoaircrafttype
        string
        ICAO aircraft type
    
    
        wtc
        string
        ICAO wake turbulence category
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
    

    string

        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometre
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        operator_country
        string
        ISO Alpha-3 country code of the operator
    
    
        operator_region
        string
        Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        wind_speed_knts
        integer
        METAR, surface wind speed in knots
    
    
        wind_dir_deg
        integer
        METAR, surface wind direction in degrees
    
    
        wind_gust_knts
        integer
        METAR, surface wind gust speed in knots
    
    
        visibility_m
        float
        METAR, visibility in m
    
    
        temperature_deg
        integer
        METAR, temperature in degrees Celsius
    
    
        press_sea_level_p
        float
        METAR, sea level pressure in hPa
    
    
        press_p
        float
        METAR, QNH in hPA
    
    
        weather_intensity
        list
        METAR, list of present weather codes: qualifier - intensity
    
    
        weather_precipitation
        list
        METAR, list of present weather codes: weather phenomena - precipitation
    
    
        weather_desc
        list
        METAR, list of present weather codes: qualifier - descriptor
    
    
        weather_obscuration
        list
        METAR, list of present weather codes: weather phenomena - obscuration
    
    
        weather_other
        list
        METAR, list of present weather codes: weather phenomena - other
    

    This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

    go_arounds_agg.csv.gz

    Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        n_landings
        integer
        Total number of landings observed on this runway in 2019
    
    
        ga_rate
        float
        Go-around rate, per 1000 landings
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
        string
        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometres
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    

    This aggregated data set is used in the paper for the generalized linear regression model.

    Downloading the trajectories

    Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

    import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

    load minimum data set

    df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

    select London City Airport, go-arounds, and 2019-01-04

    airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

    df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

    iterate over flights and pull the data from OpenSky Network

    flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

    # fetch the data from OpenSky Network
    flights.append(
      opensky.history(
        start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
        stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
        callsign=row["callsign"],
        return_flight=True,
      )
    )
    

    The flights can be converted into a Traffic object

    Traffic.from_flights(flights)

    Additional files

    Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

    validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

    validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

  6. N

    Big Sandy, MT Population Breakdown by Race

    • neilsberg.com
    csv, json
    Updated Aug 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Big Sandy, MT Population Breakdown by Race [Dataset]. https://www.neilsberg.com/research/datasets/688e191b-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Aug 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Montana, Big Sandy
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Big Sandy by race. It includes the population of Big Sandy across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Big Sandy across relevant racial categories.

    Key observations

    The percent distribution of Big Sandy population by race (across all racial categories recognized by the U.S. Census Bureau): 77.82% are white, 1.13% are Black or African American, 5.65% are American Indian and Alaska Native, 0.71% are Asian and 14.69% are multiracial.

    https://i.neilsberg.com/ch/big-sandy-mt-population-by-race.jpeg" alt="Big Sandy population by race">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Big Sandy
    • Population: The population of the racial category (excluding ethnicity) in the Big Sandy is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Big Sandy total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Sandy Population by Race & Ethnicity. You can refer the same here

  7. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  8. Aluminum alloy industrial materials defect

    • figshare.com
    zip
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ying Han; Yugang Wang (2024). Aluminum alloy industrial materials defect [Dataset]. http://doi.org/10.6084/m9.figshare.27922929.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    figshare
    Authors
    Ying Han; Yugang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset used in this study experiment was from the preliminary competition dataset of the 2018 Guangdong Industrial Intelligent Manufacturing Big Data Intelligent Algorithm Competition organized by Tianchi Feiyue Cloud (https://tianchi.aliyun.com/competition/entrance/231682/introduction). We have selected the dataset, removing images that do not meet the requirements of our experiment. All datasets have been classified for training and testing. The image pixels are all 2560×1960. Before training, all defects need to be labeled using labelimg and saved as json files. Then, all json files are converted to txt files. Finally, the organized defect dataset is detected and classified.Description of the data and file structureThis is a project based on the YOLOv8 enhanced algorithm for aluminum defect classification and detection tasks.All code has been tested on Windows computers with Anaconda and CUDA-enabled GPUs. The following instructions allow users to run the code in this repository based on a Windows+CUDA GPU system already in use.Files and variablesFile: defeat_dataset.zipDescription:SetupPlease follow the steps below to set up the project:Download Project RepositoryDownload the project repository defeat_dataset.zip from the following location.Unzip and navigate to the project folder; it should contain a subfolder: quexian_datasetDownload data1.Download data .defeat_dataset.zip2.Unzip the downloaded data and move the 'defeat_dataset' folder into the project's main folder.3. Make sure that your defeat_dataset folder now contains a subfolder: quexian_dataset.4. Within the folder you should find various subfolders such as addquexian-13, quexian_dataset, new_dataset-13, etc.softwareSet up the Python environment1.Download and install the Anaconda.2.Once Anaconda is installed, activate the Anaconda Prompt. For Windows, click Start, search for Anaconda Prompt, and open it.3.Create a new conda environment with Python 3.8. You can name it whatever you like; for example. Enter the following command: conda create -n yolov8 python=3.84.Activate the created environment. If the name is , enter: conda activate yolov8Download and install the Visual Studio Code.Install PyTorch based on your system:For Windows/Linux users with a CUDA GPU: bash conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forgeInstall some necessary libraries:Install scikit-learn with the command: conda install anaconda scikit-learn=0.24.1Install astropy with: conda install astropy=4.2.1Install pandas using: conda install anaconda pandas=1.2.4Install Matplotlib with: conda install conda-forge matplotlib=3.5.3Install scipy by entering: conda install scipy=1.10.1RepeatabilityFor PyTorch, it's a well-known fact:There is no guarantee of fully reproducible results between PyTorch versions, individual commits, or different platforms. In addition, results may not be reproducible between CPU and GPU executions, even if the same seed is used.All results in the Analysis Notebook that involve only model evaluation are fully reproducible. However, when it comes to updating the model on the GPU, the results of model training on different machines vary.Access informationOther publicly accessible locations of the data:https://tianchi.aliyun.com/dataset/public/Data was derived from the following sources:https://tianchi.aliyun.com/dataset/140666Data availability statementThe ten datasets used in this study come from Guangdong Industrial Wisdom Big Data Innovation Competition - Intelligent Algorithm Competition Rematch. and the dataset download link is https://tianchi.aliyun.com/competition/entrance/231682/information?lang=en-us. Officially, there are 4,356 images, including single blemish images, multiple blemish images and no blemish images. The official website provides 4,356 images, including single defect images, multiple defect images and no defect images. We have selected only single defect images and multiple defect images, which are 3,233 images in total. The ten defects are non-conductive, effacement, miss bottom corner, orange, peel, varicolored, jet, lacquer bubble, jump into a pit, divulge the bottom and blotch. Each image contains one or more defects, and the resolution of the defect images are all 2560×1920.By investigating the literature, we found that most of the experiments were done with 10 types of defects, so we chose three more types of defects that are more different from these ten types and more in number, which are suitable for the experiments. The three newly added datasets come from the preliminary dataset of Guangdong Industrial Wisdom Big Data Intelligent Algorithm Competition. The dataset can be downloaded from https://tianchi.aliyun.com/dataset/140666. There are 3,000 images in total, among which 109, 73 and 43 images are for the defects of bruise, camouflage and coating cracking respectively. Finally, the 10 types of defects in the rematch and the 3 types of defects selected in the preliminary round are fused into a new dataset, which is examined in this dataset.In the processing of the dataset, we tried different division ratios, such as 8:2, 7:3, 7:2:1, etc. After testing, we found that the experimental results did not differ much for different division ratios. Therefore, we divide the dataset according to the ratio of 7:2:1, the training set accounts for 70%, the validation set accounts for 20%, and the testing set accounts for 10%. At the same time, the random number seed is set to 0 to ensure that the results obtained are consistent every time the model is trained.Finally, the mean Average Precision (mAP) metric obtained from the experiment was tested on the dataset a total of three times. Each time the results differed very little, but for the accuracy of the experimental results, we took the average value derived from the highest and lowest results. The highest was 71.5% and the lowest was 71.1%, resulting in an average detection accuracy of 71.3% for the final experiment.All data and images utilized in this research are from publicly available sources, and the original creators have given their consent for these materials to be published in open-access formats.The settings for other parameters are as follows. epochs: 200,patience: 50,batch: 16,imgsz: 640,pretrained: true,optimizer: SGD,close_mosaic: 10,iou: 0.7,momentum: 0.937,weight_decay: 0.0005,box: 7.5,cls: 0.5,dfl: 1.5,pose: 12.0,kobj: 1.0,save_dir: runs/trainThe defeat_dataset.(ZIP)is mentioned in the Supporting information section of our manuscript. The underlying data are held at Figshare. DOI: 10.6084/m9.figshare.27922929.The results_images.zipin the system contains the experimental results graphs.The images_1.zipand images_2.zipin the system contain all the images needed to generate the manuscript.tex manuscript.

  9. N

    Big Bear Lake, CA Population Breakdown by Race

    • neilsberg.com
    csv, json
    Updated Aug 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Big Bear Lake, CA Population Breakdown by Race [Dataset]. https://www.neilsberg.com/research/datasets/688dbf57-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Aug 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Big Bear Lake, California
    Variables measured
    Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Big Bear Lake by race. It includes the population of Big Bear Lake across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Big Bear Lake across relevant racial categories.

    Key observations

    The percent distribution of Big Bear Lake population by race (across all racial categories recognized by the U.S. Census Bureau): 80.05% are white, 0.24% are Black or African American, 1.58% are American Indian and Alaska Native, 2.15% are Asian, 3.83% are some other race and 12.15% are multiracial.

    https://i.neilsberg.com/ch/big-bear-lake-ca-population-by-race.jpeg" alt="Big Bear Lake population by race">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Racial categories include:

    • White
    • Black or African American
    • American Indian and Alaska Native
    • Asian
    • Native Hawaiian and Other Pacific Islander
    • Some other race
    • Two or more races (multiracial)

    Variables / Data Columns

    • Race: This column displays the racial categories (excluding ethnicity) for the Big Bear Lake
    • Population: The population of the racial category (excluding ethnicity) in the Big Bear Lake is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each race as a proportion of Big Bear Lake total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Bear Lake Population by Race & Ethnicity. You can refer the same here

  10. g

    INSPIRE Download Service (predefined ATOM) for dataset special area large...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INSPIRE Download Service (predefined ATOM) for dataset special area large advertising system A3 | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_bc9fe5a2-9b29-0002-fd31-4a197570046e
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Description of the INSPIRE Download Service (predefined Atom): Development plan "Sondergebiet Grosswerbeanlage A3" - The link(s) for downloading the data sets is/are dynamically generated from Get Map calls to a WMS interface

  11. Large Scale International Boundaries

    • geodata.state.gov
    • catalog.data.gov
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2025). Large Scale International Boundaries [Dataset]. https://geodata.state.gov/geonetwork/srv/api/records/3bdb81a0-c1b9-439a-a0b1-85dac30c59b2
    Explore at:
    www:link-1.0-http--link, www:link-1.0-http--related, ogc:wms-1.3.0-http-get-capabilities, www:download-1.0-http--downloadAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset provided by
    United States Department of Statehttp://state.gov/
    Authors
    U.S. Department of State
    Area covered
    Pacific Ocean, North Pacific Ocean
    Description

    Overview

    The Office of the Geographer and Global Issues at the U.S. Department of State produces the Large Scale International Boundaries (LSIB) dataset. The current edition is version 11.4 (published 24 February 2025). The 11.4 release contains updated boundary lines and data refinements designed to extend the functionality of the dataset. These data and generalized derivatives are the only international boundary lines approved for U.S. Government use. The contents of this dataset reflect U.S. Government policy on international boundary alignment, political recognition, and dispute status. They do not necessarily reflect de facto limits of control.

    National Geospatial Data Asset

    This dataset is a National Geospatial Data Asset (NGDAID 194) managed by the Department of State. It is a part of the International Boundaries Theme created by the Federal Geographic Data Committee.

    Dataset Source Details

    Sources for these data include treaties, relevant maps, and data from boundary commissions, as well as national mapping agencies. Where available and applicable, the dataset incorporates information from courts, tribunals, and international arbitrations. The research and recovery process includes analysis of satellite imagery and elevation data. Due to the limitations of source materials and processing techniques, most lines are within 100 meters of their true position on the ground.

    Cartographic Visualization

    The LSIB is a geospatial dataset that, when used for cartographic purposes, requires additional styling. The LSIB download package contains example style files for commonly used software applications. The attribute table also contains embedded information to guide the cartographic representation. Additional discussion of these considerations can be found in the Use of Core Attributes in Cartographic Visualization section below.

    Additional cartographic information pertaining to the depiction and description of international boundaries or areas of special sovereignty can be found in Guidance Bulletins published by the Office of the Geographer and Global Issues: https://hiu.state.gov/data/cartographic_guidance_bulletins/

    Contact

    Direct inquiries to internationalboundaries@state.gov.

    Direct download: https://data.geodata.state.gov/LSIB.zip

    Attribute Structure

    The dataset uses the following attributes divided into two categories: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | Core CC1_GENC3 | Extension CC1_WPID | Extension COUNTRY1 | Core CC2 | Core CC2_GENC3 | Extension CC2_WPID | Extension COUNTRY2 | Core RANK | Core LABEL | Core STATUS | Core NOTES | Core LSIB_ID | Extension ANTECIDS | Extension PREVIDS | Extension PARENTID | Extension PARENTSEG | Extension

    These attributes have external data sources that update separately from the LSIB: ATTRIBUTE NAME | ATTRIBUTE STATUS CC1 | GENC CC1_GENC3 | GENC CC1_WPID | World Polygons COUNTRY1 | DoS Lists CC2 | GENC CC2_GENC3 | GENC CC2_WPID | World Polygons COUNTRY2 | DoS Lists LSIB_ID | BASE ANTECIDS | BASE PREVIDS | BASE PARENTID | BASE PARENTSEG | BASE

    The core attributes listed above describe the boundary lines contained within the LSIB dataset. Removal of core attributes from the dataset will change the meaning of the lines. An attribute status of “Extension” represents a field containing data interoperability information. Other attributes not listed above include “FID”, “Shape_length” and “Shape.” These are components of the shapefile format and do not form an intrinsic part of the LSIB.

    Core Attributes

    The eight core attributes listed above contain unique information which, when combined with the line geometry, comprise the LSIB dataset. These Core Attributes are further divided into Country Code and Name Fields and Descriptive Fields.

    County Code and Country Name Fields

    “CC1” and “CC2” fields are machine readable fields that contain political entity codes. These are two-character codes derived from the Geopolitical Entities, Names, and Codes Standard (GENC), Edition 3 Update 18. “CC1_GENC3” and “CC2_GENC3” fields contain the corresponding three-character GENC codes and are extension attributes discussed below. The codes “Q2” or “QX2” denote a line in the LSIB representing a boundary associated with areas not contained within the GENC standard.

    The “COUNTRY1” and “COUNTRY2” fields contain the names of corresponding political entities. These fields contain names approved by the U.S. Board on Geographic Names (BGN) as incorporated in the ‘"Independent States in the World" and "Dependencies and Areas of Special Sovereignty" lists maintained by the Department of State. To ensure maximum compatibility, names are presented without diacritics and certain names are rendered using common cartographic abbreviations. Names for lines associated with the code "Q2" are descriptive and not necessarily BGN-approved. Names rendered in all CAPITAL LETTERS denote independent states. Names rendered in normal text represent dependencies, areas of special sovereignty, or are otherwise presented for the convenience of the user.

    Descriptive Fields

    The following text fields are a part of the core attributes of the LSIB dataset and do not update from external sources. They provide additional information about each of the lines and are as follows: ATTRIBUTE NAME | CONTAINS NULLS RANK | No STATUS | No LABEL | Yes NOTES | Yes

    Neither the "RANK" nor "STATUS" fields contain null values; the "LABEL" and "NOTES" fields do. The "RANK" field is a numeric expression of the "STATUS" field. Combined with the line geometry, these fields encode the views of the United States Government on the political status of the boundary line.

    A value of “1” in the “RANK” field corresponds to an "International Boundary" value in the “STATUS” field. Values of ”2” and “3” correspond to “Other Line of International Separation” and “Special Line,” respectively.

    The “LABEL” field contains required text to describe the line segment on all finished cartographic products, including but not limited to print and interactive maps.

    The “NOTES” field contains an explanation of special circumstances modifying the lines. This information can pertain to the origins of the boundary lines, limitations regarding the purpose of the lines, or the original source of the line.

    Use of Core Attributes in Cartographic Visualization

    Several of the Core Attributes provide information required for the proper cartographic representation of the LSIB dataset. The cartographic usage of the LSIB requires a visual differentiation between the three categories of boundary lines. Specifically, this differentiation must be between: - International Boundaries (Rank 1); - Other Lines of International Separation (Rank 2); and - Special Lines (Rank 3).

    Rank 1 lines must be the most visually prominent. Rank 2 lines must be less visually prominent than Rank 1 lines. Rank 3 lines must be shown in a manner visually subordinate to Ranks 1 and 2. Where scale permits, Rank 2 and 3 lines must be labeled in accordance with the “Label” field. Data marked with a Rank 2 or 3 designation does not necessarily correspond to a disputed boundary. Please consult the style files in the download package for examples of this depiction.

    The requirement to incorporate the contents of the "LABEL" field on cartographic products is scale dependent. If a label is legible at the scale of a given static product, a proper use of this dataset would encourage the application of that label. Using the contents of the "COUNTRY1" and "COUNTRY2" fields in the generation of a line segment label is not required. The "STATUS" field contains the preferred description for the three LSIB line types when they are incorporated into a map legend but is otherwise not to be used for labeling.

    Use of the “CC1,” “CC1_GENC3,” “CC2,” “CC2_GENC3,” “RANK,” or “NOTES” fields for cartographic

  12. d

    USGS 10-m Digital Elevation Model (DEM): Hawaii: Big Island: Hillshade

    • catalog.data.gov
    • data.ioos.us
    • +1more
    Updated Jan 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (USGS) (Point of Contact) (2025). USGS 10-m Digital Elevation Model (DEM): Hawaii: Big Island: Hillshade [Dataset]. https://catalog.data.gov/dataset/usgs-10-m-digital-elevation-model-dem-hawaii-big-island-hillshade
    Explore at:
    Dataset updated
    Jan 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Island of Hawai'i, Hawaii
    Description

    A 10-meter resolution land surface digital elevation model (DEM) grayscale hillshade for Big Island in Hawaii derived from United States Geological Survey (USGS) 1/3 arc-second DEM quadrangles. For the related dataset containing numeric elevation values for this image layer, see http://pacioos.org/metadata/usgs_dem_10m_bigisland.html

  13. G

    High Resolution Digital Elevation Model Mosaic (HRDEM Mosaic) - CanElevation...

    • open.canada.ca
    • ouvert.canada.ca
    fgdb/gdb, html, json +3
    Updated Mar 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada (2025). High Resolution Digital Elevation Model Mosaic (HRDEM Mosaic) - CanElevation Series [Dataset]. https://open.canada.ca/data/en/dataset/0fe65119-e96e-4a57-8bfe-9d9245fba06b
    Explore at:
    json, pdf, html, fgdb/gdb, wms, wcsAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    Natural Resources Canada
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The High Resolution Digital Elevation Model Mosaic provides a unique and continuous representation of the high resolution elevation data available across the country. The High Resolution Digital Elevation Model (HRDEM) product used is derived from airborne LiDAR data (mainly in the south) and satellite images in the north. The mosaic is available for both the Digital Terrain Model (DTM) and the Digital Surface Model (DSM) from web mapping services. It is part of the CanElevation Series created to support the National Elevation Data Strategy implemented by NRCan. This strategy aims to increase Canada's coverage of high-resolution elevation data and increase the accessibility of the products. Unlike the HRDEM product in the same series, which is distributed by acquisition project without integration between projects, the mosaic is created to provide a single, continuous representation of strategy data. The most recent datasets for a given territory are used to generate the mosaic. This mosaic is disseminated through the Data Cube Platform, implemented by NRCan using geospatial big data management technologies. These technologies enable the rapid and efficient visualization of high-resolution geospatial data and allow for the rapid generation of dynamically derived products. The mosaic is available from Web Map Services (WMS), Web Coverage Services (WCS) and SpatioTemporal Asset Catalog (STAC) collections. Accessible data includes the Digital Terrain Model (DTM), the Digital Surface Model (DSM) and derived products such as shaded relief and slope. The mosaic is referenced to the Canadian Height Reference System 2013 (CGVD2013) which is the reference standard for orthometric heights across Canada. Source data for HRDEM datasets used to create the mosaic is acquired through multiple projects with different partners. Collaboration is a key factor to the success of the National Elevation Strategy. Refer to the “Supporting Document” section to access the list of the different partners including links to their respective data.

  14. d

    High-Resolution Infrared Enhanced Satellite Cloud Imagery - Global

    • data.gov.tw
    json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Central Weather Administration Ministry of Transportation and Communications, High-Resolution Infrared Enhanced Satellite Cloud Imagery - Global [Dataset]. https://data.gov.tw/en/datasets/33681
    Explore at:
    json, xmlAvailable download formats
    Dataset authored and provided by
    Central Weather Administration Ministry of Transportation and Communications
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    The resolution of the infrared satellite cloud map is 800x800. *Changes to the download link will be made from September 15, 112, and should be updated before December 31, 112. The old links will expire after this date. For large-scale data downloads, please apply for membership at the Meteorological Data Open Platform. https://opendata.cwa.gov.tw/index

  15. h

    webui-7k

    • huggingface.co
    Updated Nov 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Big Lab (2024). webui-7k [Dataset]. https://huggingface.co/datasets/biglab/webui-7k
    Explore at:
    Dataset updated
    Nov 1, 2024
    Dataset authored and provided by
    Big Lab
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    This data accompanies the WebUI project (https://dl.acm.org/doi/abs/10.1145/3544548.3581158) For more information, check out the project website: https://uimodeling.github.io/ To download this dataset, you need to install the huggingface-hub package pip install huggingface-hub

    Use snapshot_download from huggingface_hub import snapshot_download snapshot_download(repo_id="biglab/webui-7k", repo_type="dataset")

    IMPORTANT

    Before downloading and using, please review the copyright info here:… See the full description on the dataset page: https://huggingface.co/datasets/biglab/webui-7k.

  16. e

    INSPIRE Download Service (predefined ATOM) for dataset Large Gardens

    • data.europa.eu
    atom feed
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LVermGeo im Auftrag von Böhl-Iggelheim, INSPIRE Download Service (predefined ATOM) for dataset Large Gardens [Dataset]. https://data.europa.eu/data/datasets/723061d2-ea3f-0002-2b64-31e9ac312e5e?locale=en
    Explore at:
    atom feedAvailable download formats
    Dataset authored and provided by
    LVermGeo im Auftrag von Böhl-Iggelheim
    Description

    Description of the INSPIRE Download Service (predefined Atom): Development Plan Große Garten Böhl-Iggelheim - The link(s) for downloading the datasets is/are dynamically generated from Get Map calls to a WMS interface

  17. T

    criteo

    • tensorflow.org
    Updated Dec 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). criteo [Dataset]. https://www.tensorflow.org/datasets/catalog/criteo
    Explore at:
    Dataset updated
    Dec 22, 2022
    Description

    Criteo Uplift Modeling Dataset

    This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)

    This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.

    Data description

    This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).

    Fields

    Here is a detailed description of the fields (they are comma-separated in the file):

    • f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
    • treatment: treatment group (1 = treated, 0 = control)
    • conversion: whether a conversion occured for this user (binary, label)
    • visit: whether a visit occured for this user (binary, label)
    • exposure: treatment effect, whether the user has been effectively exposed (binary)

    Key figures

    • Format: CSV
    • Size: 459MB (compressed)
    • Rows: 25,309,483
    • Average Visit Rate: .04132
    • Average Conversion Rate: .00229
    • Treatment Ratio: .846

    Tasks

    The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:

    • benchmark for causal inference
    • uplift modeling
    • interactions between features and treatment
    • heterogeneity of treatment
    • benchmark for observational causality methods

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('criteo', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  18. N

    Big Springs, NE Hispanic or Latino Population Distribution by Their...

    • neilsberg.com
    csv, json
    Updated Aug 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2023). Big Springs, NE Hispanic or Latino Population Distribution by Their Ancestries [Dataset]. https://www.neilsberg.com/research/datasets/6c5dc95b-3d85-11ee-9abe-0aa64bf2eeb2/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Aug 18, 2023
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Nebraska, Big Springs
    Variables measured
    Hispanic or Latino population with Cuban ancestry, Hispanic or Latino population with Mexican ancestry, Hispanic or Latino population with Puerto Rican ancestry, Hispanic or Latino population with Other Hispanic or Latino ancestry, Hispanic or Latino population with Cuban ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Mexican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Puerto Rican ancestry as Percent of Total Hispanic Population, Hispanic or Latino population with Other Hispanic or Latino ancestry as Percent of Total Hispanic Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) Origin / Ancestry for Hispanic population and (b) respective population as a percentage of the total Hispanic population, we initially analyzed and categorized the data for each of the ancestries across the Hispanic or Latino population. It is ensured that the population estimates used in this dataset pertain exclusively to ancestries for the Hispanic or Latino population. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Big Springs Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of Big Springs, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of Big Springs.

    Key observations

    Among the Hispanic population in Big Springs, regardless of the race, the largest group is of Mexican origin, with a population of 49 (100% of the total Hispanic population).

    https://i.neilsberg.com/ch/big-springs-ne-population-by-race-and-ethnicity.jpeg" alt="Big Springs Non-Hispanic population by race">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Origin for Hispanic or Latino population include:

    • Mexican
    • Black or African American
    • Puerto Rican
    • Cuban
    • Other Hispanic or Latino

    Variables / Data Columns

    • Origin: This column displays the origin for Hispanic or Latino population for the Big Springs
    • Population: The population of the specific origin for Hispanic or Latino population in the Big Springs is shown in this column.
    • % of Total Hispanic Population: This column displays the percentage distribution of each Hispanic origin as a proportion of Big Springs total Hispanic or Latino population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Big Springs Population by Race & Ethnicity. You can refer the same here

  19. T

    Development Engineering - Large Lots

    • open.piercecountywa.gov
    • internal.open.piercecountywa.gov
    • +2more
    Updated Jan 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Development Engineering - Large Lots [Dataset]. https://open.piercecountywa.gov/Maps-and-Geospatial/Development-Engineering-Large-Lots/qdcv-6g9t
    Explore at:
    application/rssxml, application/rdfxml, csv, kmz, application/geo+json, xml, tsv, kmlAvailable download formats
    Dataset updated
    Jan 13, 2025
    Description

    Polygons of active and historic large lot development in unincorporated Pierce County. Please read metadata (https://matterhorn.piercecountywa.gov/GISmetadata/pdbplandev_large_lots.html) for additional information. Any use or data download constitutes acceptance of the Terms of Use (https://matterhorn.piercecountywa.gov/Disclaimer/PierceCountyGISDataTermsofUse.pdf).

  20. T

    imdb_reviews

    • tensorflow.org
    Updated Sep 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). imdb_reviews [Dataset]. https://www.tensorflow.org/datasets/catalog/imdb_reviews
    Explore at:
    Dataset updated
    Sep 20, 2024
    Description

    Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('imdb_reviews', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:
141 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Search
Clear search
Close search
Google apps
Main menu