7 datasets found
  1. A

    ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-prices-data-5-new-features-230f/d4c4de7c/?iid=000-393&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Similar Datasets:

    Boston House Prices: LINK

    Context

    This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.

    The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

    Modifications with respect to the original data

    This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.

    The distances were calculated using the Haversine formula with the Longitude and Latitude:

    https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">

    where:

    • phi_1 and phi_2 are the Latitudes of point 1 and point 2, respectively
    • lambda_1 and lambda_2 are the Longitudes of point 1 and point 2, respectively
    • r is the radius of the Earth (6371km)

    Content

    The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:

    1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]

    Source

    This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.

    The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

    --- Original source retains full ownership of the source dataset ---

  2. Immigration system statistics data tables

    • gov.uk
    • totalwrapture.com
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
    Explore at:
    Dataset updated
    May 22, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

    If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Accessible file formats

    The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
    If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
    Please tell us what format you need. It will help us if you say what assistive technology you use.

    Related content

    Immigration system statistics, year ending March 2025
    Immigration system statistics quarterly release
    Immigration system statistics user guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Passenger arrivals

    https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)

    ‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

    Electronic travel authorisation

    https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
    ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)

    https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
    Vis_D01: Entry clearance visa applications, by nationality and visa type
    Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

    Additional dat

  3. d

    Louisville Metro KY - Officer Involved Shooting Database and Statistical...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louisville/Jefferson County Information Consortium (2023). Louisville Metro KY - Officer Involved Shooting Database and Statistical Analysis 5-1-2018 [Dataset]. https://catalog.data.gov/dataset/louisville-metro-ky-officer-involved-shooting-database-and-statistical-analysis-5-1-2018
    Explore at:
    Dataset updated
    Apr 13, 2023
    Dataset provided by
    Louisville/Jefferson County Information Consortium
    Area covered
    Kentucky, Louisville
    Description

    Officer Involved Shooting (OIS) Database and Statistical Analysis. Data is updated after there is an officer involved shooting.PIU#Incident # - the number associated with either the incident or used as reference to store the items in our evidence rooms Date of Occurrence Month - month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Date of Occurrence Day - day of the month the incident occurred (Note the year is labeled on the tab of the spreadsheet)Time of Occurrence - time the incident occurredAddress of incident - the location the incident occurredDivision - the LMPD division in which the incident actually occurredBeat - the LMPD beat in which the incident actually occurredInvestigation Type - the type of investigation (shooting or death)Case Status - status of the case (open or closed)Suspect Name - the name of the suspect involved in the incidentSuspect Race - the race of the suspect involved in the incident (W-White, B-Black)Suspect Sex - the gender of the suspect involved in the incidentSuspect Age - the age of the suspect involved in the incidentSuspect Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Suspect Weapon - the type of weapon the suspect used in the incidentOfficer Name - the name of the officer involved in the incidentOfficer Race - the race of the officer involved in the incident (W-White, B-Black, A-Asian)Officer Sex - the gender of the officer involved in the incidentOfficer Age - the age of the officer involved in the incidentOfficer Ethnicity - the ethnicity of the suspect involved in the incident (H-Hispanic, N-Not Hispanic)Officer Years of Service - the number of years the officer has been serving at the time of the incidentLethal Y/N - whether or not the incident involved a death (Y-Yes, N-No, continued-pending)Narrative - a description of what was determined from the investigationContact:Carol Boylecarol.boyle@louisvilleky.gov

  4. CMS Program Statistics - Medicare Part A & Part B - All Types of Service

    • catalog.data.gov
    • healthdata.gov
    • +1more
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Medicare & Medicaid Services (2025). CMS Program Statistics - Medicare Part A & Part B - All Types of Service [Dataset]. https://catalog.data.gov/dataset/medicare-part-a-part-b-all-types-of-service-1f0f5
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset provided by
    Centers for Medicare & Medicaid Services
    Description

    The CMS Program Statistics - Medicare Part A & Part B - All Types of Service tables provide use and payment data by type of coverage and type of service. For additional information on enrollment, providers, and Medicare use and payment, visit the CMS Program Statistics page. These data do not exist in a machine-readable format, so the view data and API options are not available. Please use the download function to access the data. Below is the list of tables: MDCR SUMMARY AB 1. Medicare Part A and Part B Summary: Utilization, Program Payments, and Cost Sharing for All Original Medicare Beneficiaries, by Type of Coverage and Type of Service, Yearly Trend MDCR SUMMARY AB 2. Medicare Part A and Part B Summary: Utilization, Program Payments, and Cost Sharing for Aged Original Medicare Beneficiaries, by Type of Coverage and Type of Service, Yearly Trend MDCR SUMMARY AB 3. Medicare Part A and Part B Summary: Utilization, Program Payments, and Cost Sharing for Disabled Original Medicare Beneficiaries by Type of Coverage and Type of Service, Yearly Trend MDCR SUMMARY AB 4. Medicare Part A and Part B Summary: Utilization, Program Payments, and Cost Sharing for Original Medicare Beneficiaries, by Type of Coverage, Demographic Characteristics, and Medicare-Medicaid Enrollment Status MDCR SUMMARY AB 5. Medicare Part A and Part B Summary: Utilization, Program Payments, and Cost Sharing for Original Medicare Beneficiaries, by Type of Coverage and by Area of Residence MDCR SUMMARY AB 6. Medicare Part A and Part B Summary: Utilization and Program Payments for Original Medicare Beneficiaries, by Type of Entitlement, Amount of Program Payments, Type of Coverage, and Type of Service

  5. D

    Background data for: Latent-variable modeling of ordinal outcomes in...

    • dataverse.no
    • dataone.org
    pdf, text/tsv, txt
    Updated Feb 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Manfred Krug; Manfred Krug; Fabian Vetter; Fabian Vetter; Lukas Sönning; Lukas Sönning (2024). Background data for: Latent-variable modeling of ordinal outcomes in language data analysis [Dataset]. http://doi.org/10.18710/WI9TEH
    Explore at:
    text/tsv(4475), text/tsv(1079156), txt(8660), pdf(160867), pdf(287207)Available download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    DataverseNO
    Authors
    Manfred Krug; Manfred Krug; Fabian Vetter; Fabian Vetter; Lukas Sönning; Lukas Sönning
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2018
    Area covered
    Malta
    Dataset funded by
    German Humboldt Foundation
    Bavarian Ministry for Science, Research and the Arts
    Spanish Ministry of Education and Science with European Regional Development Fund
    Description

    This dataset contains tabular files with information about the usage preferences of speakers of Maltese English with regard to 63 pairs of lexical expressions. These pairs (e.g. truck-lorry or realization-realisation) are known to differ in usage between BrE and AmE (cf. Algeo 2006). The data were elicited with a questionnaire that asks informants to indicate whether they always use one of the two variants, prefer one over the other, have no preference, or do not use either expression (see Krug and Sell 2013 for methodological details). Usage preferences were therefore measured on a symmetric 5-point ordinal scale. Data were collected between 2008 to 2018, as part of a larger research project on lexical and grammatical variation in settings where English is spoken as a native, second, or foreign language. The current dataset, which we use for our methodological study on ordinal data modeling strategies, consists of a subset of 500 speakers that is roughly balanced on year of birth. Abstract: Related publication In empirical work, ordinal variables are typically analyzed using means based on numeric scores assigned to categories. While this strategy has met with justified criticism in the methodological literature, it also generates simple and informative data summaries, a standard often not met by statistically more adequate procedures. Motivated by a survey of how ordered variables are dealt with in language research, we draw attention to an un(der)used latent-variable approach to ordinal data modeling, which constitutes an alternative perspective on the most widely used form of ordered regression, the cumulative model. Since the latent-variable approach does not feature in any of the studies in our survey, we believe it is worthwhile to promote its benefits. To this end, we draw on questionnaire-based preference ratings by speakers of Maltese English, who indicated on a 5-point scale which of two synonymous expressions (e.g. package-parcel) they (tend to) use. We demonstrate that a latent-variable formulation of the cumulative model affords nuanced and interpretable data summaries that can be visualized effectively, while at the same time avoiding limitations inherent in mean response models (e.g. distortions induced by floor and ceiling effects). The online supplementary materials include a tutorial for its implementation in R.

  6. g

    Census of selected service industries, 1972 summary statistic file SA

    • datasearch.gesis.org
    • dataverse-staging.rdmc.unc.edu
    Updated Jan 22, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census; United States (2020). Census of selected service industries, 1972 summary statistic file SA [Dataset]. https://datasearch.gesis.org/dataset/httpsdataverse.unc.eduoai--hdl1902.29C-7
    Explore at:
    Dataset updated
    Jan 22, 2020
    Dataset provided by
    Odum Institute Dataverse Network
    Authors
    U.S. Bureau of the Census; United States
    Description

    The subject matter in the five individual files which comprise the total data package is similar. SA1 presents detailed kind-of- business statistics (two-, three-, and four-digit industry levels) on number of establishments and receipts (total and with payroll), number of proprietorships and partnerships, annual and first quarter payroll, and number of paid employees. SA2 contains the same data items as above for selected services total, in addition to the number of establishments and receipt s for five major kind-of-business groups. SA3 contains number of establishments and receipts for selected services total and for 130 kind-of- business classifications. SA4 presents receipts and rank by volume of receipts. SA5 statistics are given by city size for number of incorporated cities, total population, number of establishments, receipts, yearly payroll, and the percent of total by population and sales.

    Each of the files has slightly different geography for which summaries are presented. SA1 has summaries for the United States, divisions, States, SCA's and SMSA's, and counties and cities with over 300 service establishments. SA2 presents summary counts for each city of 2,500 inhabitants or more and for remainder of county. SA3 has summaries for the United States, regions, divisions, and States. SA4 presents summaries for the 250 largest counties and cities. SA5 presents United States tot al.

    Data pertain to the date of the census, 1972. The first major enumeration of Selected Service establishments covered 1933. Censuses were also taken in 1939, 1948, and in 5 year intervals since

  7. Airlines Delay

    • kaggle.com
    Updated Nov 14, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giovanni Gonzalez (2019). Airlines Delay [Dataset]. https://www.kaggle.com/datasets/giovamata/airlinedelaycauses/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Giovanni Gonzalez
    Description

    The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics (BTS) tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled and diverted flights appears in DOT's monthly Air Travel Consumer Report, published about 30 days after the month's end, as well as in summary tables posted on this website. BTS began collecting details on the causes of flight delays in June 2003. Summary statistics and raw data are made available to the public at the time the Air Travel Consumer Report is released.

    This version of the dataset was compiled from the Statistical Computing Statistical Graphics 2009 Data Expo and is also available here.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-california-housing-prices-data-5-new-features-230f/d4c4de7c/?iid=000-393&v=presentation

‘California Housing Prices Data (5 new features!)’ analyzed by Analyst-2

Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
California
Description

Analysis of ‘California Housing Prices Data (5 new features!)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/fedesoriano/california-housing-prices-data-extra-features on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Similar Datasets:

Boston House Prices: LINK

Context

This is the dataset is a modified version of the California Housing Data used in the paper Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.. It serves as an excellent introduction to implementing machine learning algorithms because it requires rudimentary data cleaning, has an easily understandable list of variables and sits at an optimal size between being too toyish and too cumbersome.

The data contains information from the 1990 California census. So although it may not help you with predicting current housing prices like the Zillow Zestimate dataset, it does provide an accessible introductory dataset for teaching people about the basics of machine learning.

Modifications with respect to the original data

This dataset includes 5 extra features defined by me: "Distance to coast", "Distance to Los Angeles", "Distance to San Diego", "Distance to San Jose", and "Distance to San Francisco". These extra features try to account for the distance to the nearest coast and the distance to the centre of the largest cities in California.

The distances were calculated using the Haversine formula with the Longitude and Latitude:

https://wikimedia.org/api/rest_v1/media/math/render/svg/a65dbbde43ff45bacd2505fcf32b44fc7dcd8cc0" alt="">

where:

  • phi_1 and phi_2 are the Latitudes of point 1 and point 2, respectively
  • lambda_1 and lambda_2 are the Longitudes of point 1 and point 2, respectively
  • r is the radius of the Earth (6371km)

Content

The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data. The columns are as follows, their names are pretty self-explanatory:

1) Median House Value: Median house value for households within a block (measured in US Dollars) [$] 2) Median Income: Median income for households within a block of houses (measured in tens of thousands of US Dollars) [10k$] 3) Median Age: Median age of a house within a block; a lower number is a newer building [years] 4) Total Rooms: Total number of rooms within a block 5) Total Bedrooms: Total number of bedrooms within a block 6) Population: Total number of people residing within a block 7) Households: Total number of households, a group of people residing within a home unit, for a block 8) Latitude: A measure of how far north a house is; a higher value is farther north [°] 9) Longitude: A measure of how far west a house is; a higher value is farther west [°] 10) Distance to coast: Distance to the nearest coast point [m] 11) Distance to Los Angeles: Distance to the centre of Los Angeles [m] 12) Distance to San Diego: Distance to the centre of San Diego [m] 13) Distance to San Jose: Distance to the centre of San Jose [m] 14) Distance to San Francisco: Distance to the centre of San Francisco [m]

Source

This data was entirely modified and cleaned by me. The original data (without the distance features) was initially featured in the following paper: Pace, R. Kelley, and Ronald Barry. "Sparse spatial autoregressions." Statistics & Probability Letters 33.3 (1997): 291-297.

The original dataset can be found under the following link: https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

--- Original source retains full ownership of the source dataset ---

Search
Clear search
Close search
Google apps
Main menu