8 datasets found
  1. aggregate-data-italian-cities-from-wikipedia

    • kaggle.com
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    alepuzio (2020). aggregate-data-italian-cities-from-wikipedia [Dataset]. https://www.kaggle.com/alepuzio/aggregatedataitaliancitiesfromwikipedia/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 20, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    alepuzio
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset is the result of my study on web-scraping of English Wikipedia in R and my tests on regression and classification modelization in R.

    Content

    The content is create by reading the appropriate articles in English Wikipedia about Italian cities: I did'nt run NPL analisys but only the table with the data and I ranked every city from 0 to N in every aspect. About the values, 0 means "*the city is not ranked in this aspect*" and N means "*the city is at first place, in descending order of importance, in this aspect* ". If there's no ranking in a particular aspect (for example, the only existence of the airports/harbours with no additional data about the traffic or the size), then 0 means "*no existence*" and N means "*there are N airports/harbours*". The only not-numeric column is the column with the name of the cities in English form, except some exceptions (for example, "*Bra (CN)* " because of simplicity.

    Acknowledgements

    I acknowledge the Wikimedia Foundation for his work, his mission and to make available the cover image of this dataset, (please read the article "The Ideal city (painting)") . I acknowledge too StackOverflow and Cross-Validated to be the most important focus of technical knowledge in the world, all the people in Kaggle for the suggestions.

    Inspiration

    As a beginner in data analisys and modelization (Ok, I passed the exam of statistics in Politecnico di Milano (Italy), but there are more than 10 years that I don't work in this topic and my memory is getting old ^_^) I worked more on data clean, dataset building and building the simplest modelization.

    You can use this datase to realize which city is good to live or to expand this to add some other data from Wikipedia (not only reading the tables but too to read the text adn extrapolate the data from the meaningless text.)

  2. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +2more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  3. Italian Airbnb Dataset

    • kaggle.com
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hype (2024). Italian Airbnb Dataset [Dataset]. https://www.kaggle.com/datasets/salvatoremarcello/italian-airbnb-dataset/versions/6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    hype
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Italy
    Description

    This dataset provides a snapshot of Airbnb listings across major Italian cities and regions, offering valuable insights into the short-term rental market in Italy. Whether you're interested in pricing trends, regional variations, or the impact of seasonality, this dataset has something for you.

    Data refer to a period between September 2023 and September 2024

    Key Features:

    • City-level data: Explore listings in popular cities like Florence, Milan, Naples, Rome, and Venice.
    • Regional insights: Analyze trends across broader regions including Puglia, Sicily, and Trentino.
    • Comprehensive metrics: Data includes pricing, review scores, host details, and more.
    • Seasonal analysis: Data spans different periods, allowing for comparisons across seasons.

    Data Dictionary:

    • id: Unique identifier for each listing.
    • number_of_reviews_ltm: Number of reviews received in the last twelve months.
    • date_of_scraping: Date the data was scraped.
    • host_since: Date the host joined Airbnb.
    • host_is_superhost: Whether the host is a superhost (t/f).
    • host_total_listings_count: Total number of listings the host has.
    • neighbourhood: Neighborhood where the listing is located.
    • latitude: Latitude coordinate of the listing.
    • longitude: Longitude coordinate of the listing.
    • room_type: Type of room (e.g., entire home/apt, private room).
    • accommodates: Number of guests the listing can accommodate.
    • price: Price per night (in local currency).
    • number_of_reviews: Total number of reviews.
    • review_scores_rating: Overall rating of the listing.
    • review_scores_accuracy: Accuracy rating.
    • review_scores_cleanliness: Cleanliness rating.
    • review_scores_checkin: Check-in rating.
    • review_scores_communication: Communication rating.
    • review_scores_location: Location rating.
    • review_scores_value: Value rating.
    • reviews_per_month: Number of reviews per month.
    • place: City or region where the listing is located.
    • period: Time period when the data was scraped (e.g., Early Winter).

    For visualization reason it is also provide a csv with all city neighbourhoods and the relative geojson.

    I also added datasets that group listings according to period and neighbourhood/cities, quantitative features were been aggregate according to median and MAD, qualitative according to mode and Shannon's entropy.

    Disclaimer:

    This dataset is intended for informational and research purposes only. It is not affiliated with Airbnb or any other organization.

  4. N

    Dataset for Italy, New York Census Bureau Income Distribution by Gender

    • neilsberg.com
    Updated Jan 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Italy, New York Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3b9b2be-abcb-11ee-8b96-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 9, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Italy town household income by gender. The dataset can be utilized to understand the gender-based income distribution of Italy town income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Italy, New York annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)
    • Italy, New York annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy town income distribution by gender. You can refer the same here

  5. N

    Dataset for Italy, TX Census Bureau Income Distribution by Race

    • neilsberg.com
    Updated Jan 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Dataset for Italy, TX Census Bureau Income Distribution by Race [Dataset]. https://www.neilsberg.com/research/datasets/80d59b98-9fc2-11ee-b48f-3860777c1fe6/
    Explore at:
    Dataset updated
    Jan 3, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy, Texas
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Italy median household income by race. The dataset can be utilized to understand the racial distribution of Italy income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Italy, TX median household income breakdown by race betwen 2011 and 2021
    • Median Household Income by Racial Categories in Italy, TX (2021, in 2022 inflation-adjusted dollars)

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy median household income by race. You can refer the same here

  6. N

    Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Italy,...

    • neilsberg.com
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Italy, New York Household Incomes Across 4 Age Groups and 16 Income Brackets. Annual Editions Collection // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2ed61d30-aeee-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New York, Italy
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Italy town household income by age. The dataset can be utilized to understand the age-based income distribution of Italy town income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Italy, New York annual median income by age groups dataset (in 2022 inflation-adjusted dollars)
    • Age-wise distribution of Italy, New York household incomes: Comparative analysis across 16 income brackets

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy town income distribution by age. You can refer the same here

  7. Italian Negation Constructions - Tweets

    • kaggle.com
    Updated Feb 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Italian Negation Constructions - Tweets [Dataset]. https://www.kaggle.com/datasets/thedevastator/italian-negation-constructions-tweets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 11, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Italian Negation Constructions - Tweets

    Exploring Language Variation Across 10 Cities

    By [source]

    About this dataset

    This dataset, the Twitter Italian Negation (TIN) Corpus, provides an interesting glimpse into language change in Romance languages with the emergence of non-standard uses of negations. This collection contains 10,000 tweets from ten different cities -Milan, Rome, Naples, Palermo, Bologna, Turin, Florence Cagliari Genoa and New York City -each collected in August 2019. The data includes tokenized text and frequency measures for each tweet as well as a city column so users can explore regional differences. With this resource users can uncover how the language of these cities is changing over time or even how language usage between neighboring countries or states may differ. Get ready to dive deep into the fascinating shifts that occur between spoken and written languages!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset contains 10,000 tweets in Italian gathered from ten different cities between August and December 2019. This collection of tweets provides an interesting insight into the language change phenomena in Romance languages, specifically with regard to non-standard uses of negations.

    The dataset is composed of nine columns: token, absolute frequency, relative frequency, variation, and city from which the tweet originated. Each row represents a single token in a particular tweet: each tweet can contain more than one token.

    By using this dataset you can analyze and compare patterns of usage across different cities or even within a specific city. You can also compare variations within tokens between different cities to understand how certain constructions are used differently across regions or dialects. Additionally you could use this data to examine trends in literary works such as poetry by looking at the most commonly used words and phrases over time.

    To use the data effectively, it is important first to understand what each column represents:

    • Tok (Tokenized text): This is text that has been broken down into individual words or tokens representing all of the words found in a particular tweet including punctuation marks like commas or exclamation points;

    • Abs (Absolute Frequency): This is the total number of times that a particular token appears within all tweets;

    • Rel (Relative Frequency): This is calculated by calculating how many times a particular token appears compared to other tokens;

    • Var (Variation): This indicates whether there have been any alterations made compared to standard usage such as “has” being replaced with “haz”;

    • City: The originator's city corresponds with each tweet guiding analysis on usage differences among locales for example “Milan” or “Genua” but also generalized larger geographic areas such as “Italy” versus other countries like “United States.

      Using these numeric values alongside thematic exploration allows for understanding not only usages but trends across different geographic populations relative representations both locally and globally provided by Twitter users regarding issues related language use especially non-standard dialectical contructs throughout Italy

    Research Ideas

    • Studying the regional variation of Italian negation constructions by comparing the frequency and variation between cities.
    • Investigating language change over time by tracking changes in relative and absolute frequencies of negation constructions across tweets.
    • Exploring how different socio-economic contexts or trends such as news, fashion, sports impacted the evolution of language use in tweets in each city

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: interessa+word1.csv | Column name | Description | |:--------------|:------------------------------------------------------| | tok | Tokenized text of the tweet. (String) | | abs | Absolute frequency of a token in the...

  8. Italy Traffic Congestion Index: Average: Italy: Turin

    • ceicdata.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com, Italy Traffic Congestion Index: Average: Italy: Turin [Dataset]. https://www.ceicdata.com/en/italy/traffic-congestion-index-average-by-cities/traffic-congestion-index-average-italy-turin
    Explore at:
    Dataset provided by
    CEIC Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Nov 13, 2023 - Nov 24, 2023
    Area covered
    Italy
    Variables measured
    Vehicle Traffic
    Description

    Traffic Congestion Index: Average: Italy: Turin data was reported at 9.400 Index in 24 Nov 2023. This records a decrease from the previous number of 21.250 Index for 23 Nov 2023. Traffic Congestion Index: Average: Italy: Turin data is updated daily, averaging 4.860 Index from Jan 2019 (Median) to 24 Nov 2023, with 1682 observations. The data reached an all-time high of 50.500 Index in 15 Dec 2022 and a record low of 0.280 Index in 17 May 2020. Traffic Congestion Index: Average: Italy: Turin data remains active status in CEIC and is reported by CEIC Data. The data is categorized under Global Database’s Italy – Table TI.TCI: Traffic Congestion Index: Average: by Cities (Discontinued). [COVID-19-IMPACT]

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
alepuzio (2020). aggregate-data-italian-cities-from-wikipedia [Dataset]. https://www.kaggle.com/alepuzio/aggregatedataitaliancitiesfromwikipedia/code
Organization logo

aggregate-data-italian-cities-from-wikipedia

Elementary data about Italian cities in the specilized articles in Wikipedia

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
alepuzio
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

This dataset is the result of my study on web-scraping of English Wikipedia in R and my tests on regression and classification modelization in R.

Content

The content is create by reading the appropriate articles in English Wikipedia about Italian cities: I did'nt run NPL analisys but only the table with the data and I ranked every city from 0 to N in every aspect. About the values, 0 means "*the city is not ranked in this aspect*" and N means "*the city is at first place, in descending order of importance, in this aspect* ". If there's no ranking in a particular aspect (for example, the only existence of the airports/harbours with no additional data about the traffic or the size), then 0 means "*no existence*" and N means "*there are N airports/harbours*". The only not-numeric column is the column with the name of the cities in English form, except some exceptions (for example, "*Bra (CN)* " because of simplicity.

Acknowledgements

I acknowledge the Wikimedia Foundation for his work, his mission and to make available the cover image of this dataset, (please read the article "The Ideal city (painting)") . I acknowledge too StackOverflow and Cross-Validated to be the most important focus of technical knowledge in the world, all the people in Kaggle for the suggestions.

Inspiration

As a beginner in data analisys and modelization (Ok, I passed the exam of statistics in Politecnico di Milano (Italy), but there are more than 10 years that I don't work in this topic and my memory is getting old ^_^) I worked more on data clean, dataset building and building the simplest modelization.

You can use this datase to realize which city is good to live or to expand this to add some other data from Wikipedia (not only reading the tables but too to read the text adn extrapolate the data from the meaningless text.)

Search
Clear search
Close search
Google apps
Main menu