8 datasets found

aggregate-data-italian-cities-from-wikipedia
kaggle.com
Updated May 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
alepuzio (2020). aggregate-data-italian-cities-from-wikipedia [Dataset]. https://www.kaggle.com/alepuzio/aggregatedataitaliancitiesfromwikipedia/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
alepuzio
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Context

This dataset is the result of my study on web-scraping of English Wikipedia in R and my tests on regression and classification modelization in R.

Content

The content is create by reading the appropriate articles in English Wikipedia about Italian cities: I did'nt run NPL analisys but only the table with the data and I ranked every city from 0 to N in every aspect. About the values, 0 means "*the city is not ranked in this aspect*" and N means "*the city is at first place, in descending order of importance, in this aspect* ". If there's no ranking in a particular aspect (for example, the only existence of the airports/harbours with no additional data about the traffic or the size), then 0 means "*no existence*" and N means "*there are N airports/harbours*". The only not-numeric column is the column with the name of the cities in English form, except some exceptions (for example, "*Bra (CN)* " because of simplicity.

Acknowledgements

I acknowledge the Wikimedia Foundation for his work, his mission and to make available the cover image of this dataset, (please read the article "The Ideal city (painting)") . I acknowledge too StackOverflow and Cross-Validated to be the most important focus of technical knowledge in the world, all the people in Kaggle for the suggestions.

Inspiration

As a beginner in data analisys and modelization (Ok, I passed the exam of statistics in Politecnico di Milano (Italy), but there are more than 10 years that I don't work in this topic and my memory is getting old ^_^) I worked more on data clean, dataset building and building the simplest modelization.

You can use this datase to realize which city is good to live or to expand this to add some other data from Wikipedia (not only reading the tables but too to read the text adn extrapolate the data from the meaningless text.)
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
Italian Airbnb Dataset
kaggle.com
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
hype (2024). Italian Airbnb Dataset [Dataset]. https://www.kaggle.com/datasets/salvatoremarcello/italian-airbnb-dataset/versions/6
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
hype
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
Italy
Description
This dataset provides a snapshot of Airbnb listings across major Italian cities and regions, offering valuable insights into the short-term rental market in Italy. Whether you're interested in pricing trends, regional variations, or the impact of seasonality, this dataset has something for you.

Data refer to a period between September 2023 and September 2024

Key Features:

City-level data: Explore listings in popular cities like Florence, Milan, Naples, Rome, and Venice.

Regional insights: Analyze trends across broader regions including Puglia, Sicily, and Trentino.

Comprehensive metrics: Data includes pricing, review scores, host details, and more.

Seasonal analysis: Data spans different periods, allowing for comparisons across seasons.

Data Dictionary:

id: Unique identifier for each listing.

number_of_reviews_ltm: Number of reviews received in the last twelve months.

date_of_scraping: Date the data was scraped.

host_since: Date the host joined Airbnb.

host_is_superhost: Whether the host is a superhost (t/f).

host_total_listings_count: Total number of listings the host has.

neighbourhood: Neighborhood where the listing is located.

latitude: Latitude coordinate of the listing.

longitude: Longitude coordinate of the listing.

room_type: Type of room (e.g., entire home/apt, private room).

accommodates: Number of guests the listing can accommodate.

price: Price per night (in local currency).

number_of_reviews: Total number of reviews.

review_scores_rating: Overall rating of the listing.

review_scores_accuracy: Accuracy rating.

review_scores_cleanliness: Cleanliness rating.

review_scores_checkin: Check-in rating.

review_scores_communication: Communication rating.

review_scores_location: Location rating.

review_scores_value: Value rating.

reviews_per_month: Number of reviews per month.

place: City or region where the listing is located.

period: Time period when the data was scraped (e.g., Early Winter).

For visualization reason it is also provide a csv with all city neighbourhoods and the relative geojson.

I also added datasets that group listings according to period and neighbourhood/cities, quantitative features were been aggregate according to median and MAD, qualitative according to mode and Shannon's entropy.

Disclaimer:

This dataset is intended for informational and research purposes only. It is not affiliated with Airbnb or any other organization.
N
Dataset for Italy, New York Census Bureau Income Distribution by Gender
neilsberg.com
Updated Jan 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Dataset for Italy, New York Census Bureau Income Distribution by Gender [Dataset]. https://www.neilsberg.com/research/datasets/b3b9b2be-abcb-11ee-8b96-3860777c1fe6/
Explore at:
Dataset updated
Jan 9, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Italy town household income by gender. The dataset can be utilized to understand the gender-based income distribution of Italy town income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Italy, New York annual median income by work experience and sex dataset : Aged 15+, 2010-2022 (in 2022 inflation-adjusted dollars)

Italy, New York annual income distribution by work experience and gender dataset (Number of individuals ages 15+ with income, 2021)

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy town income distribution by gender. You can refer the same here
N
Dataset for Italy, TX Census Bureau Income Distribution by Race
neilsberg.com
Updated Jan 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Dataset for Italy, TX Census Bureau Income Distribution by Race [Dataset]. https://www.neilsberg.com/research/datasets/80d59b98-9fc2-11ee-b48f-3860777c1fe6/
Explore at:
Dataset updated
Jan 3, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy, Texas
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Italy median household income by race. The dataset can be utilized to understand the racial distribution of Italy income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Italy, TX median household income breakdown by race betwen 2011 and 2021

Median Household Income by Racial Categories in Italy, TX (2021, in 2022 inflation-adjusted dollars)

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy median household income by race. You can refer the same here
N
Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Italy,...
neilsberg.com
Updated Aug 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Italy, New York Household Incomes Across 4 Age Groups and 16 Income Brackets. Annual Editions Collection // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2ed61d30-aeee-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Aug 7, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York, Italy
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Italy town household income by age. The dataset can be utilized to understand the age-based income distribution of Italy town income.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Italy, New York annual median income by age groups dataset (in 2022 inflation-adjusted dollars)

Age-wise distribution of Italy, New York household incomes: Comparative analysis across 16 income brackets

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Italy town income distribution by age. You can refer the same here
Italian Negation Constructions - Tweets
kaggle.com
Updated Feb 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Italian Negation Constructions - Tweets [Dataset]. https://www.kaggle.com/datasets/thedevastator/italian-negation-constructions-tweets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 11, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Italian Negation Constructions - Tweets

Exploring Language Variation Across 10 Cities

By [source]

About this dataset

This dataset, the Twitter Italian Negation (TIN) Corpus, provides an interesting glimpse into language change in Romance languages with the emergence of non-standard uses of negations. This collection contains 10,000 tweets from ten different cities -Milan, Rome, Naples, Palermo, Bologna, Turin, Florence Cagliari Genoa and New York City -each collected in August 2019. The data includes tokenized text and frequency measures for each tweet as well as a city column so users can explore regional differences. With this resource users can uncover how the language of these cities is changing over time or even how language usage between neighboring countries or states may differ. Get ready to dive deep into the fascinating shifts that occur between spoken and written languages!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains 10,000 tweets in Italian gathered from ten different cities between August and December 2019. This collection of tweets provides an interesting insight into the language change phenomena in Romance languages, specifically with regard to non-standard uses of negations.

The dataset is composed of nine columns: token, absolute frequency, relative frequency, variation, and city from which the tweet originated. Each row represents a single token in a particular tweet: each tweet can contain more than one token.

By using this dataset you can analyze and compare patterns of usage across different cities or even within a specific city. You can also compare variations within tokens between different cities to understand how certain constructions are used differently across regions or dialects. Additionally you could use this data to examine trends in literary works such as poetry by looking at the most commonly used words and phrases over time.

To use the data effectively, it is important first to understand what each column represents:

Tok (Tokenized text): This is text that has been broken down into individual words or tokens representing all of the words found in a particular tweet including punctuation marks like commas or exclamation points;

Abs (Absolute Frequency): This is the total number of times that a particular token appears within all tweets;

Rel (Relative Frequency): This is calculated by calculating how many times a particular token appears compared to other tokens;

Var (Variation): This indicates whether there have been any alterations made compared to standard usage such as “has” being replaced with “haz”;

City: The originator's city corresponds with each tweet guiding analysis on usage differences among locales for example “Milan” or “Genua” but also generalized larger geographic areas such as “Italy” versus other countries like “United States.

Using these numeric values alongside thematic exploration allows for understanding not only usages but trends across different geographic populations relative representations both locally and globally provided by Twitter users regarding issues related language use especially non-standard dialectical contructs throughout Italy

Research Ideas

Studying the regional variation of Italian negation constructions by comparing the frequency and variation between cities.

Investigating language change over time by tracking changes in relative and absolute frequencies of negation constructions across tweets.

Exploring how different socio-economic contexts or trends such as news, fashion, sports impacted the evolution of language use in tweets in each city

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: interessa+word1.csv | Column name | Description | |:--------------|:------------------------------------------------------| | tok | Tokenized text of the tweet. (String) | | abs | Absolute frequency of a token in the...
Italy Traffic Congestion Index: Average: Italy: Turin
ceicdata.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com, Italy Traffic Congestion Index: Average: Italy: Turin [Dataset]. https://www.ceicdata.com/en/italy/traffic-congestion-index-average-by-cities/traffic-congestion-index-average-italy-turin
Explore at:
Dataset provided by
CEIC Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Nov 13, 2023 - Nov 24, 2023
Area covered
Italy
Variables measured
Vehicle Traffic
Description
Traffic Congestion Index: Average: Italy: Turin data was reported at 9.400 Index in 24 Nov 2023. This records a decrease from the previous number of 21.250 Index for 23 Nov 2023. Traffic Congestion Index: Average: Italy: Turin data is updated daily, averaging 4.860 Index from Jan 2019 (Median) to 24 Nov 2023, with 1682 observations. The data reached an all-time high of 50.500 Index in 15 Dec 2022 and a record low of 0.280 Index in 17 May 2020. Traffic Congestion Index: Average: Italy: Turin data remains active status in CEIC and is reported by CEIC Data. The data is categorized under Global Database’s Italy – Table TI.TCI: Traffic Congestion Index: Average: by Cities (Discontinued). [COVID-19-IMPACT]
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

alepuzio (2020). aggregate-data-italian-cities-from-wikipedia [Dataset]. https://www.kaggle.com/alepuzio/aggregatedataitaliancitiesfromwikipedia/code

aggregate-data-italian-cities-from-wikipedia

Elementary data about Italian cities in the specilized articles in Wikipedia

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 20, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

alepuzio

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Context

This dataset is the result of my study on web-scraping of English Wikipedia in R and my tests on regression and classification modelization in R.

Content

The content is create by reading the appropriate articles in English Wikipedia about Italian cities: I did'nt run NPL analisys but only the table with the data and I ranked every city from 0 to N in every aspect. About the values, 0 means "*the city is not ranked in this aspect*" and N means "*the city is at first place, in descending order of importance, in this aspect* ". If there's no ranking in a particular aspect (for example, the only existence of the airports/harbours with no additional data about the traffic or the size), then 0 means "*no existence*" and N means "*there are N airports/harbours*". The only not-numeric column is the column with the name of the cities in English form, except some exceptions (for example, "*Bra (CN)* " because of simplicity.

Acknowledgements

I acknowledge the Wikimedia Foundation for his work, his mission and to make available the cover image of this dataset, (please read the article "The Ideal city (painting)") . I acknowledge too StackOverflow and Cross-Validated to be the most important focus of technical knowledge in the world, all the people in Kaggle for the suggestions.

Inspiration

As a beginner in data analisys and modelization (Ok, I passed the exam of statistics in Politecnico di Milano (Italy), but there are more than 10 years that I don't work in this topic and my memory is getting old ^_^) I worked more on data clean, dataset building and building the simplest modelization.

You can use this datase to realize which city is good to live or to expand this to add some other data from Wikipedia (not only reading the tables but too to read the text adn extrapolate the data from the meaningless text.)

Clear search

Close search

Google apps

Main menu

aggregate-data-italian-cities-from-wikipedia

Context

Content

Acknowledgements

Inspiration

Geonames - All Cities with a population > 1000

Italian Airbnb Dataset

Dataset for Italy, New York Census Bureau Income Distribution by Gender

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Dataset for Italy, TX Census Bureau Income Distribution by Race

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Italy,...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Italian Negation Constructions - Tweets

Italian Negation Constructions - Tweets

Exploring Language Variation Across 10 Cities

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Italy Traffic Congestion Index: Average: Italy: Turin

aggregate-data-italian-cities-from-wikipedia

Elementary data about Italian cities in the specilized articles in Wikipedia

Context

Content

Acknowledgements

Inspiration