Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.
Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.
This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.
As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.
The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.
Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.
If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.
This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.
The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.
By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.
Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 20 verified Forum locations in United States with complete contact information, ratings, reviews, and location data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All of the data files (.csv) required for the Forum Navigation analysis. Includes an actor-actor edgelist, an actor-forum edgelist, an actor-issue edgelist, a forum-issue edgelist, an issue-issue matrix, an isolates dataset, an actor attributes data frame (actor_orgtype.csv) and a forum attributes data frame (ForumSponsorship.csv).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data Description: Internet news, forum text data. Time Range: 2017-10-16 to 2017-11-17. Data Volume: 482400. Data Format: json. Author: State Information Center.
Facebook
TwitterThis dataset was created by Jabbar
Facebook
TwitterFinancial overview and grant giving statistics of Cross Border Data Forum Inc
Facebook
TwitterData pulled from ACS that used to power certain visualizations for the "Women Veterans Forum" Story
Facebook
TwitterThis dataset provides comprehensive real-time Google search in forum and discussion board data aggregated from across the web. The data is continuously updated to provide the most current discussions and conversations. Users can leverage this dataset for community research, social listening, market research, and trend analysis tools. Whether you're building a forum aggregator, conducting community research, or developing social listening tools, this dataset provides current and reliable forum data. The dataset is delivered in a JSON format via REST API.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 36 verified Forum locations in France with complete contact information, ratings, reviews, and location data.
Facebook
TwitterThe polar science community has unprecedented opportunities for science based on open, networked, digital, and ubiquitous communication technologies. This presents an urgent need for the polar science community, Arctic residents, and other stakeholders to establish a clear global vision, strategy, and action plan to ensure effective stewardship of and access to valuable Arctic and Antarctic data resources. The Second Polar Data Forum (PDF II) built on the successes of the first Polar Data Forum (PDF I) in Tokyo, Japan, October 2013. PDF II further refined relevant themes and priorities and is accelerating progress by establishing clear actions to address the target issues. This includes meeting the needs of society and science through promotion of open data and effective data stewardship, establishing sharing and interoperability of data at a variety of levels, developing trusted data management systems, and ensuring long-term data preservation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: This dataset will be used to post updated versions of capital budget deliberative forums into the future.
Collected responses from surveys distributed at the City of Pittsburgh's Capital Budget Deliberative Forum Meetings. These surveys collected resident sentiment on what capital projects the City should prioritize in the upcoming year.
The collected responses of residents who attended the deliberative meetings concerning the City's 2019 & 2020 Capital Budgets. Residents were asked to identify a specific capital project (or projects) that they felt needed to be completed in their neighborhood. They were asked to be specific as to work needed and the location. They were then asked to use a series of options to note how important they found a list of capital project priorities. Finally, they were asked to share their opinion of the Deliberative Budget Forum by choosing from a series of options.
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for Forum Drive cross streets in Bend, OR.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of the forum data used in this study.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Catchment Data & Evidence Forum was organised by the CaBA Catchment Data User Group (CDUG), which is a multi-sectoral CaBA working group, consisting of data users, data providers and modellers. The focus of this year’s FORUM was on the collection and use of CaBA data. This is locally collected data, which is needed to compliment the national evidence base from government agencies and research institutes. The enormous potential for CaBA data to contribute to the 25 Year Environment Plan was a key opportunity identified at the 2018 FORUM. A series of discussions, followed by interactive voting, were used to set a workplan for the CaBA National Support Group.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents the findings of the Belmont Forum’s survey on Open Data which targeted the global environmental research and data infrastructure community. It highlights users’ perceptions of the term “open data”, expectations of infrastructure functionalities, and barriers and enablers for the sharing of data. A wide range of good practice examples was pointed out by the respondents which demonstrates a substantial uptake of data sharing through e-infrastructures and a further need for enhancement and consolidation. Among all policy responses, funder policies seem to be the most important motivator. This supports the conclusion that stronger mandates will strengthen the case for data sharing.
Facebook
TwitterThis dataset provides information about the number of properties, residents, and average property values for Forum Lane cross streets in Iron Station, NC.
Facebook
TwitterThis file is for my postgraduate study. The data is concerned with the Coursera forum data of R Programming. All data has been anonymized for the purpose of data privacy.
The data scaped is dated from September 2018 to September 2019.
Facebook
TwitterDress Forum Inc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This datasets was created as a result of web scrapping using python tool called scrapy.
This contains the user interacting with light novel sharing forum. Subject column represent subject of the forum post and almost every subject is a name of light novel they are sharing. Second column represent who created it. Third column shows that how many views got that light novel. Next column shows how many users have replied for that post. Next 2 columns show who post last on that post and when that last post made.
I hope this will unlock hidden secrets of light novel reading communities.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 13 verified Forum locations in Mexico with complete contact information, ratings, reviews, and location data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Note: This is a work in progress, and not all the Kaggle forums are included in this dataset. The remaining forums will be added when I end solving some issues with the data generators related to these forums.
Welcome to the Kaggle Forum Discussions dataset!. This dataset contains curated data about recent discussions opened in the different forums on Kaggle. The data is obtained through web scraping techniques, using the selenium libraries, and converting text data into markdown style using the markdownify package.
This dataset contains information about the discussion main topic, topic title, comments, votes, medals and more, and is designed to serve as a complement to the data available on the Kaggle meta dataset, specifically for recent discussions. Keep reading to see the details.
As a dynamic website that relies heavily in JavaScript (JS), I extracted the data in this dataset through web scraping techniques using the selenium library.
The functions and classes used to scrape the data on Kaggle where stored on a utility script publicly available here. As JS-generated pages like Kaggle are unstable where trying to scrape them, the mentioned script implements capabilities for retrying connections and to await for elements to appear.
Each Forum was scrapped using a one notebook for each, then the mentioned notebooks were connected to a central notebook that generates this dataset. Also the discussions are scrapped in parallel so to enhance speed. This dataset represents all the data that can be gathered in a single notebook session, from the most recent to the most old.
If you need more control on the data you want to research, feel free to import all you need from the utility script mentioned before.
This dataset contains several folders, each named as the discussion forum they contain data about. For example, the 'competition-hosting' folder contains data about the Competition Hosting forum. Inside each folder, you'll find two files: one is a csv file and the other a json file.
The json file (in Python, represented as a dictionary) is indexed with the ID that Kaggle assigns to the mentioned discussion. Each ID is paired with its corresponding discussion, which is represented as a nested dictionary (the discussion dict), which contains the following fields: - title: The title of the main topic. - content: Content of the main topic. - tags: List containing the discussion's tags. - datetime: Date and time at which the discussion was published (in ISO 8601 format). - votes: Number of votes gotten by the discussion. - medal: Medal awarded by the main topic (if any). - user: User that published the main topic. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_comments: Total number of comments in the current discussion. - n_appreciation_comments: Total number of appreciation comments in the current discussion. - comments: Dictionary containing data about the comments in the discussion. Each comment is indexed by an ID assigned by Kaggle, containing the following fields: - content: Comment's content. - is_appreciation: Wether the comment is of appreciation. - is_deleted: Wether the comment was deleted. - n_replies: Number of replies to the comment. - datetime: Date and time at which the comment was published (in ISO 8601 format). - votes: Number of votes gotten by the current comment. - medal: Medal awarded by the comment (if any). - user: User that published the comment. - expertise: Publisher's expertise, measured by the Kaggle progression system. - n_deleted: Total number of deleted replies (including self). - replies: A dict following this same format.
By other side, the csv file serves as a summary of the json file, containing information about the comments limited to the hottest and most voted comments.
Note: Only the 'content' field is mandatory for each discussion. The availability of the other fields is subject to the stability of the scraping tasks, which may also affect the update frequency.