Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
What is inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data was scraped from a successful online C2C fashion store with over 9M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
For other licensing options, contact me.
https://brightdata.com/licensehttps://brightdata.com/license
Unlock the full potential of LinkedIn data with our extensive dataset that combines profiles, company information, and job listings into one powerful resource for business decision-making, strategic hiring, competitive analysis, and market trend insights. This all-encompassing dataset is ideal for professionals, recruiters, analysts, and marketers aiming to enhance their strategies and operations across various business functions. Dataset Features
Profiles: Dive into detailed public profiles featuring names, titles, positions, experience, education, skills, and more. Utilize this data for talent sourcing, lead generation, and investment signaling, with a refresh rate ensuring up to 30 million records per month. Companies: Access comprehensive company data including ID, country, industry, size, number of followers, website details, subsidiaries, and posts. Tailored subsets by industry or region provide invaluable insights for CRM enrichment, competitive intelligence, and understanding the startup ecosystem, updated monthly with up to 40 million records. Job Listings: Explore current job opportunities detailed with job titles, company names, locations, and employment specifics such as seniority levels and employment functions. This dataset includes direct application links and real-time application numbers, serving as a crucial tool for job seekers and analysts looking to understand industry trends and the job market dynamics.
Customizable Subsets for Specific Needs Our LinkedIn dataset offers the flexibility to tailor the dataset according to your specific business requirements. Whether you need comprehensive insights across all data points or are focused on specific segments like job listings, company profiles, or individual professional details, we can customize the dataset to match your needs. This modular approach ensures that you get only the data that is most relevant to your objectives, maximizing efficiency and relevance in your strategic applications. Popular Use Cases
Strategic Hiring and Recruiting: Track talent movement, identify growth opportunities, and enhance your recruiting efforts with targeted data. Market Analysis and Competitive Intelligence: Gain a competitive edge by analyzing company growth, industry trends, and strategic opportunities. Lead Generation and CRM Enrichment: Enrich your database with up-to-date company and professional data for targeted marketing and sales strategies. Job Market Insights and Trends: Leverage detailed job listings for a nuanced understanding of employment trends and opportunities, facilitating effective job matching and market analysis. AI-Driven Predictive Analytics: Utilize AI algorithms to analyze large datasets for predicting industry shifts, optimizing business operations, and enhancing decision-making processes based on actionable data insights.
Whether you are mapping out competitive landscapes, sourcing new talent, or analyzing job market trends, our LinkedIn dataset provides the tools you need to succeed. Customize your access to fit specific needs, ensuring that you have the most relevant and timely data at your fingertips.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Evolution of the Manosphere Across the Web
We make available data related to subreddit and standalone forums from the manosphere.
We also make available Perspective API annotations for all posts.
You can find the code in GitHub.
Please cite this paper if you use this data:
@article{ribeiroevolution2021, title={The Evolution of the Manosphere Across the Web}, author={Ribeiro, Manoel Horta and Blackburn, Jeremy and Bradlyn, Barry and De Cristofaro, Emiliano and Stringhini, Gianluca and Long, Summer and Greenberg, Stephanie and Zannettou, Savvas}, booktitle = {{Proceedings of the 15th International AAAI Conference on Weblogs and Social Media (ICWSM'21)}}, year={2021} }
We make available data for forums and for relevant subreddits (56 of them, as described in subreddit_descriptions.csv). These are available, 1 line per post in each subreddit Reddit in /ndjson/reddit.ndjson. A sample for example is:
{ "author": "Handheld_Gaming", "date_post": 1546300852, "id_post": "abcusl", "number_post": 9.0, "subreddit": "Braincels", "text_post": "Its been 2019 for almost 1 hour And I am at a party with 120 people, half of them being foids. The last year had been the best in my life. I actually was happy living hope because I was redpilled to the death.
Now that I am blackpilled I see that I am the shortest of all men and that I am the only one with a recessed jaw.
Its over. Its only thanks to my age old friendship with chads and my social skills I had developed in the past year that a lot of men like me a lot as a friend.
No leg lengthening syrgery is gonna save me. Ignorance was a bliss. Its just horror now seeing that everyone can make out wirth some slin hoe at the party.
I actually feel so unbelivably bad for turbomanlets. Life as an unattractive manlet is a pain, I cant imagine the hell being an ugly turbomanlet is like. I would have roped instsntly if I were one. Its so unfair.
Tallcels are fakecels and they all can (and should) suck my cock.
If I were 17cm taller my life would be a heaven and I would be the happiest man alive.
Just cope and wait for affordable body tranpslants.", "thread": "t3_abcusl" }
We here describe the .sqlite and .ndjson files that contain the data from the following forums.
(avfm) --- https://d2ec906f9aea-003845.vbulletin.net (incels) --- https://incels.co/ (love_shy) --- http://love-shy.com/lsbb/ (redpilltalk) --- https://redpilltalk.com/ (mgtow) --- https://www.mgtow.com/forums/ (rooshv) --- https://www.rooshvforum.com/ (pua_forum) --- https://www.pick-up-artist-forum.com/ (the_attraction) --- http://www.theattractionforums.com/
The files are in folders /sqlite/ and /ndjson.
2.1 .sqlite
All the tables in the sqlite. datasets follow a very simple {key:value} format. Each key is a thread name (for example /threads/housewife-is-like-a-job.123835/) and each value is a python dictionary or a list. This file contains three tables:
idx each key is the relative address to a thread and maps to a post. Each post is represented by a dict:
"type": (list) in some forums you can add a descriptor such as
[RageFuel] to each topic, and you may also have special
types of posts, like sticked/pool/locked posts.
"title": (str) title of the thread;
"link": (str) link to the thread;
"author_topic": (str) username that created the thread;
"replies": (int) number of replies, may differ from number of
posts due to difference in crawling date;
"views": (int) number of views;
"subforum": (str) name of the subforum;
"collected": (bool) indicates if raw posts have been collected;
"crawled_idx_at": (str) datetime of the collection.
processed_posts each key is the relative address to a thread and maps to a list with posts (in order). Each post is represented by a dict:
"author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.
raw_posts each key is the relative address to a thread and maps to a list with unprocessed posts (in order). Each post is represented by a dict:
"post_raw": (binary) raw html binary; "crawled_at": (str) datetime of the collection.
2.2 .ndjson
Each line consists of a json object representing a different comment with the following fields:
"author": (str) author's username; "resume_author": (str) author's little description; "joined_author": (str) date author joined; "messages_author": (int) number of messages the author has; "text_post": (str) text of the main post; "number_post": (int) number of the post in the thread; "id_post": (str) unique post identifier (depends), for sure unique within thread; "id_post_interaction": (list) list with other posts ids this post quoted; "date_post": (str) datetime of the post, "links": (tuple) nice tuple with the url parsed, e.g. ('https', 'www.youtube.com', '/S5t6K9iwcdw'); "thread": (str) same as key; "crawled_at": (str) datetime of the collection.
We also run each post and reddit post through perspective, the files are located in the /perspective/ folder. They are compressed with gzip. One example output
{ "id_post": 5200, "hate_output": { "text": "I still can\u2019t wrap my mind around both of those articles about these c~~~s sleeping with poor Haitian Men. Where\u2019s the uproar?, where the hell is the outcry?, the \u201cpig\u201d comments or the \u201ccreeper comments\u201d. F~~~ing hell, if roles were reversed and it was an article about Men going to Europe where under 18 sex in legal, you better believe they would crucify the writer of that article and DEMAND an apology by the paper that wrote it.. This is exactly what I try and explain to people about the double standards within our modern society. A bunch of older women, wanna get their kicks off by sleeping with poor Men, just before they either hit or are at menopause age. F~~~ing unreal, I\u2019ll never forget going to Sweden and Norway a few years ago with one of my buddies and his girlfriend who was from there, the legal age of consent in Norway is 16 and in Sweden it\u2019s 15. I couldn\u2019t believe it, but my friend told me \u201c hey, it\u2019s normal here\u201d . Not only that but the age wasn\u2019t a big different in other European countries as well. One thing i learned very quickly was how very Misandric Sweden as well as Denmark were.", "TOXICITY": 0.6079781, "SEVERE_TOXICITY": 0.53744453, "INFLAMMATORY": 0.7279288, "PROFANITY": 0.58842486, "INSULT": 0.5511079, "OBSCENE": 0.9830818, "SPAM": 0.17009115 } }
A nice way to read some of the files of the dataset is using SqliteDict, for example:
from sqlitedict import SqliteDict processed_posts = SqliteDict("./data/forums/incels.sqlite", tablename="processed_posts")
for key, posts in processed_posts.items(): for post in posts: # here you could do something with each post in the dataset pass
Additionally, we provide two .sqlite files that are helpers used in the analyses. These are related to reddit, and not to the forums! They are:
channel_dict.sqlite a sqlite where each key corresponds to a subreddit and values are lists of dictionaries users who posted on it, along with timestamps.
author_dict.sqlite a sqlite where each key corresponds to an author and values are lists of dictionaries of the subreddits they posted on, along with timestamps.
These are used in the paper for the migration analyses.
Although we did our best to clean the data and be consistent across forums, this is not always possible. In the following subsections we talk about the particularities of each forum, directions to improve the parsing which were not pursued as well as give some examples on how things work in each forum.
6.1 incels
Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.
types: for the incel forums the special types associated with each thread in the idx table are “Sticky”, “Pool”, “Closed”, and the custom types added by users, such as [LifeFuel]. These last ones are all in brackets. You can see some examples of these in the on the example thread page.
quotes: quotes in this forum were quite nice and thus, all quotations are deterministic.
6.2 LoveShy
Check out an archived version of the front page, the thread page and a post page, as well as a dump of the data stored for a thread page and a post page.
types: no types were parsed. There are some rules in the forum, but not significant.
quotes: quotes were obtained from exact text+author match, or author match + a jaccard
Are you in a tsunami evacuation zone?Search for your property in the address search bar, or pan around the map to see Canterbury’s tsunami evacuation zones.For more information on when and how to evacuate, see our tsunami pages on the Environment Canterbury website or contact your local emergency management officer at your city or district council. What do these zones mean?Red zone The red zone includes beaches, estuaries, harbours and river mouths. These are the areas most likely to be affected by a tsunami and that would experience the highest water depths and strongest currents. This area can be affected by small tsunamis that are unlikely to flood land but that cause strong surges or currents in the water.You should leave this zone immediately if you:feel a long or strong earthquake, ORare told to evacuate by Civil Defence, ORyou hear tsunami sirens (where they are installed).Stay out of this zone until you are told it is safe to go back.You can expect to evacuate the red zone several times in your lifetime. Orange zone The orange zone is less likely to be affected by a tsunami and includes low-lying coastal areas that are likely to be flooded in a large tsunami that inundates land.You should leave this zone immediately if you:feel a long or strong earthquake, ORare told to evacuate by Civil Defence, ORyou hear tsunami sirens (where they are installed).Stay out of this zone until you are told it is safe to go back.You can expect to evacuate the orange zone maybe a few times in your lifetime. Yellow zone Christchurch and Banks Peninsula also have a yellow zone, which are areas least likely to be affected by a tsunami. They could potentially be flooded in a very large tsunami coming from across the Pacific Ocean.You do not need to leave this zone if you feel a long or strong earthquake. There are no known local tsunami sources that would flood this area.If you hear tsunami sirens, turn on the radio or visit your local council's Civil Defence page.If you hear or see an announcement by Civil Defence to evacuate the yellow zone, you must leave immediately and stay out of this zone until you’re told it’s safe to go back. While it is possible you will have to evacuate this zone sometime in your lifetime, it is unlikely.Note: In most other parts of New Zealand, yellow zones need to be evacuated in a long or strong earthquake. You should check local tsunami evacuation zones when spending time on the coast.Not in a zone?If you are not in a tsunami evacuation zone you:don’t need to evacuate in a long or strong earthquake,don’t need to evacuate during a warning from Civil Defence,may wish to open your home to family or friends who need to evacuate. How are the tsunami zones drawn? What are they based on?Tsunami evacuation zones are areas that we recommend people evacuate from as a precaution after they feel a long or strong earthquake, or in an official tsunami warning. They encompass many different possible tsunami scenarios. The area that would be flooded in any particular tsunami depends on many factors, including:the size of the earthquakeprecisely how the earthquake fault movedthe direction the tsunami is coming fromthe tide level when the largest waves arrive.Every tsunami will be different and we can never say for sure exactly which areas within a zone will be flooded. There is no one tsunami that would flood an entire zone.We consider many different tsunami scenario models when drawing the tsunami evacuation zones. The inland boundary of the zones is based on several ‘worst-case’ scenarios – tsunamis that we might expect once every 2500 years. The alternative to this approach is to have hundreds of different evacuation zones for hundreds of possible tsunami scenarios, which would be really confusing and hard to communicate.The zone boundaries, particularly in urban areas, usually follow some sort of feature that is easy to see on the ground, like roads, so that you easily know whether you are in or out of the zone. We also consider the locations of schools, rest homes and parks. For example, if part of a park could be flooded in a tsunami, we will generally include the whole park in the evacuation zone, as it is much easier to simply close an entire park, rather than just part of it.You can download our reports on reviews of tsunami evacuation zones for Christchurch City, Selwyn District, Waimakariri District and Hurunui District https://www.ecan.govt.nz/your-region/your-environment/natural-hazards/tsunamis/tsunami-evacuation-zones-and-warnings/ which describe how the zones for these Districts were drawn and the models that they are based on.Tsunami evacuation zones in other parts of Canterbury will be reviewed using the results of modelling being done in 2021 and 2022. Please contact hazards@ecan.govt.nz if you need further information.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
NOTICE: As of September 6, 2024, the wastewater surveillance dataset will now be hosted on: https://data.chhs.ca.gov/dataset/wastewater-surveillance-data-california. The dataset will no longer be updated on this webpage and will contain a historic dataset. Users who wish to access new and updated data will need to visit the new webpage.
The California Department of Public Health (CDPH) and the California State Water Resources Control Board (SWRCB) together are coordinating with several wastewater utilities, local health departments, universities, and laboratories in California on wastewater surveillance for SARS-CoV-2, the virus causing COVID-19. Data collected from this network of participants, called the California Surveillance of Wastewater Systems (Cal-SuWers) Network, are submitted to the U.S. Centers for Disease Control and Prevention (CDC) National Wastewater Surveillance System (NWSS).
During the COVID-19 pandemic, it has been used for the detection and quantification of SARS-CoV-2 virus shed into wastewater via feces of infected persons. Wastewater surveillance tracks ""pooled samples"" that reflect the overall disease activity for a community serviced by the wastewater treatment plant (an area known as a ""sewershed""), rather than tracking samples from individual people. Notably, while SARS-CoV-2 virus is shed fecally by infected persons, COVID-19 is spread primarily through the respiratory route, and there is no evidence to date that exposure to treated or untreated wastewater has led to infection with COVID-19.
Collecting and analyzing wastewater samples for the overall amount of SARS-CoV-2 viral particles present can help inform public health about the level of viral transmission within a community. Data from wastewater testing are not intended to replace existing COVID-19 surveillance systems, but are meant to complement them. While wastewater surveillance cannot determine the exact number of infected persons in the area being monitored, it can provide the overall trend of virus concentration within that community. With our local partners, the SWRCB and CDPH are currently monitoring and quantifying levels of SARS-CoV-2 at the headworks or ""influent"" of 21 wastewater treatment plants representing approximately 48% of California's population."
Gain exclusive access to verified Shopify store owners with our premium Shopify Users Email List. This database includes essential data fields such as Store Name, Website, Contact Name, Email Address, Phone Number, Physical Address, Revenue Size, Employee Size, and more on demand. Leverage real-time, accurate data to enhance your marketing efforts and connect with high-value Shopify merchants. Whether you're targeting small businesses or enterprise-level Shopify stores, our database ensures precision and reliability for optimized lead generation and outreach strategies. Key Highlights: ✅ 3.9M+ Shopify Stores ✅ Direct Contact Info of Shopify Store Owners ✅ 40+ Data Points ✅ Lifetime Access ✅ 10+ Data Segmentations ✅ FREE Sample Data
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.
The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.
Using these data, the COVID-19 community level was classified as low, medium, or high.
COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.
For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.
Archived Data Notes:
This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.
March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.
March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.
March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.
March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.
March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).
March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.
April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.
April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.
May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.
June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.
July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.
July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.
July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.
July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.
July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.
August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.
August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.
August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.
August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.
August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.
September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.
September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
During my Senior in the Shan Dong University, my tutor give me research direction of University thesis, which is bitcoin transaction data analysis, so I crawled all of bitcoin transaction data from January 2009 to February 2018.I make statistical analysis and quantitative analysis,I hope this data will give you some help, data mining is interesting and helping not only in the skill of data mining but also in our life.
I crawled these data from website https://www.blockchain.com/explorer, each file contains many blocks,the scope of blocks is reflected in the file name,e.g. this file 0-68732.csv is composed of zero block which is also called genesis block until 68732 block.if a block that didn't have input is not in this file. let's see the columns and rows, there has five columns, the Height column represent block height,the Input column represent the input address of this block,the Output column represent the output address of this block,the Sum column represent bitcoin transaction amount corresponding to the Output,the Time column represent the generation time of this block.A block contains many transactions.
The page is part four of all data, others can be found here https://www.kaggle.com/shiheyingzhe/datasets
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset got a lot of love from the community and I saw many people asking for an updated version, so I have uploaded the latest scraped and processed data ( as of 21/03/2021). Now it's super easy for anyone to get the latest dataset (Just use a single command), so in case you need bleeding-edge data, or you want to see the code, you can look here. Hope this solves all problems! If there are any issues with the data, please forgive me and write about it in the comments or raise an issue on github. I will pick it up 👍 Thank you everyone for the emails and messages. As usual, have fun! ❤️ 😁
This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome! Incase anyone wants to check out the work, I have all uploaded all the code files, including the scraping module here
Have fun!
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened. Here are some column definitions:
R_
and B_
prefix signifies red and blue corner fighter stats respectively_opp_
containing columns is the average of damage done by the opponent on the fighterKD
is number of knockdownsSIG_STR
is no. of significant strikes 'landed of attempted'SIG_STR_pct
is significant strikes percentageTOTAL_STR
is total strikes 'landed of attempted'TD
is no. of takedownsTD_pct
is takedown percentagesSUB_ATT
is no. of submission attemptsPASS
is no. times the guard was passed?REV
is the no. of Reversals landedHEAD
is no. of significant strinks to the head 'landed of attempted'BODY
is no. of significant strikes to the body 'landed of attempted'CLINCH
is no. of significant strikes in the clinch 'landed of attempted'GROUND
is no. of significant strikes on the ground 'landed of attempted'win_by
is method of winlast_round
is last round of the fight (ex. if it was a KO in 1st, then this will be 1)last_round_time
is when the fight ended in the last roundFormat
is the format of the fight (3 rounds, 5 rounds etc.)Referee
is the name of the Refdate
is the date of the fightlocation
is the location in which the event took placeFight_type
is which weight class and whether it's a title bout or notWinner
is the winner of the fightStance
is the stance of the fighter (orthodox, southpaw, etc.)Height_cms
is the height in centimeterReach_cms
is the reach of the fighter (arm span) in centimeterWeight_lbs
is the weight of the fighter in pounds (lbs)age
is the age of the fightertitle_bout
Boolean value of whether it is title fight or notweight_class
is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)no_of_rounds
is the number of rounds the fight was scheduled forcurrent_lose_streak
is the count of current concurrent losses of the fightercurrent_win_streak
is the count of current concurrent wins of the fighterdraw
is the number of draws in the fighter's ufc careerwins
is the number of wins in the fighter's ufc careerlosses
is the number of losses in the fighter's ufc careertotal_rounds_fought
is the average of total rounds fought by the fightertotal_time_fought(seconds)
is the count of total time spent fighting in secondstotal_title_bouts
is the total number of title bouts taken part in by the fighterwin_by_Decision_Majority
is the number of wins by majority judges decision in the fighter's ufc careerwin_by_Decision_Split
is the number of wins by split judges decision in the fighter's ufc careerwin_by_Decision_Unanimous
is the number of wins by unanimous judges decision in the fighter's ufc careerwin_by_KO/TKO
is the number of wins by knockout in the fighter's ufc careerwin_by_Submission
is the number of wins by submission in the fighter's ufc careerwin_by_TKO_Doctor_Stoppage
is the number of wins by doctor stoppage in the fighter's ufc careerInspiration: https://github.com/Hitkul/UFC_Fight_Prediction Provided ideas on how to store per fight data. Unfortunately, the entire UFC website and fightmetric website changed so couldn't reuse any of the code.
Print Progress Bar: https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a To display progress of how much download is complete in the terminal
You can check out who I am and what I do here
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Annotations: The dataset has annotations for
object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the St. Petersburg population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of St. Petersburg across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of St. Petersburg was 263,553, a 0.70% increase year-by-year from 2022. Previously, in 2022, St. Petersburg population was 261,722, an increase of 0.83% compared to a population of 259,578 in 2021. Over the last 20 plus years, between 2000 and 2023, population of St. Petersburg increased by 14,900. In this period, the peak population was 265,463 in the year 2019. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for St. Petersburg Population by Year. You can refer the same here
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for UltraChat 200k
Dataset Description
This is a heavily filtered version of the UltraChat dataset and was used to train Zephyr-7B-β, a state of the art 7b chat model. The original datasets consists of 1.4M dialogues generated by ChatGPT and spanning a wide range of topics. To create UltraChat 200k, we applied the following logic:
Selection of a subset of data for faster supervised fine tuning. Truecasing of the dataset, as we observed around 5% of… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Red Bank Hispanic or Latino population. It includes the distribution of the Hispanic or Latino population, of Red Bank, by their ancestries, as identified by the Census Bureau. The dataset can be utilized to understand the origin of the Hispanic or Latino population of Red Bank.
Key observations
Among the Hispanic population in Red Bank, regardless of the race, the largest group is of Mexican origin, with a population of 2,993 (75.07% of the total Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Origin for Hispanic or Latino population include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Red Bank Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recommended citation
Gütschow, J.; Busch, D.; Pflüger, M. (2024): The PRIMAP-hist national historical emissions time series v2.6 (1750-2023). zenodo. doi:10.5281/zenodo.13752654.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Content
Abstract
The PRIMAP-hist dataset combines several published datasets to create a comprehensive set of greenhouse gas emission pathways for every country and Kyoto gas, covering the years 1750 to 2023, and almost all UNFCCC (United Nations Framework Convention on Climate Change) member states as well as most non-UNFCCC territories. The data resolves the main IPCC (Intergovernmental Panel on Climate Change) 2006 categories. For CO2, CH4, and N2O subsector data for Energy, Industrial Processes and Product Use (IPPU), and Agriculture are available. The "country reported data priority" (CR) scenario of the PRIMAP-hist datset prioritizes data that individual countries report to the UNFCCC. For developed countries, AnnexI in terms of the UNFCCC, this is the data submitted anually in the "common reporting format" (CRF). For developing countries, non-AnnexI in terms of the UNFCCC, this is the data available through the UNFCCC DI portal (di.unfccc.int) with additional country submissions read from pdf and where available xls(x) or csv files. For a list of these submissions please see below. For South Korea the 2023 official GHG inventory has not yet been submitted to the UNFCCC but is included in PRIMAP-hist. PRIMAP-hist also includes official data for Taiwan which is not recognized as a party to the UNFCCC.
Gaps in the country reported data are filled using third party data such as CDIAC, EI (fossil CO2), Andrew cement emissions data (cement), FAOSTAT (agriculture), and EDGAR v8.0 (all sectors for CO2, CH4, N2O, except energy CO2), and EDGAR v7.0 (IPPU, f-gases). Lower priority data are harmonized to higher priority data in the gap-filling process.
For the third party priority time series gaps in the third party data are filled from country reported data sources.
Data for earlier years which are not available in the above mentioned sources are sourced from EDGAR-HYDE, CEDS, and RCP (N2O only) historical emissions.
The v2.4 release of PRIMAP-hist reduced the time-lag from 2 to 1 years for the October release. Thus the present version 2.6 includes data for 2023. For energy CO2 growth rates from the EI Statistical Review of World Energy are used to extend the country reported (CR) or CDIAC (TP) data to 2023. For CO2 from cement production Andrew cement data are used. For other gases and sectors we have to rely on numerical methods to estimate emissions for 2023.
Version 2.6 of the PRIMAP-hist dataset does not include emissions from Land Use, Land-Use Change, and Forestry (LULUCF) in the main file. LULUCF data are included in the file with increased number of significant digits and have to be used with care as they are constructed from different sources using different methodologies and are not harmonized.
The PRIMAP-hist v2.6 dataset is an updated version of
Gütschow, J.; Pflüger, M.; Busch, D. (2024): The PRIMAP-hist national historical emissions time series v2.5.1 (1750-2022). zenodo. doi:10.5281/zenodo.10705513.
The Changelog indicates the most important changes. You can also check the issue tracker on github.com/JGuetschow/PRIMAP-hist for additional information on issues found after the release of the dataset. Detailed per country information is available from the detailed changelog which is available on the primap.org website and on zenodo.
Use of the dataset and full description
Before using the dataset, please read this document and the article describing the methodology, especially the section on uncertainties and the section on limitations of the method and use of the dataset.
Gütschow, J.; Jeffery, L.; Gieseke, R.; Gebel, R.; Stevens, D.; Krapp, M.; Rocha, M. (2016): The PRIMAP-hist national historical emissions time series, Earth Syst. Sci. Data, 8, 571-603, doi:10.5194/essd-8-571-2016
Please notify us (mail@johannes-guetschow.de) if you use the dataset so that we can keep track of how it is used and take that into consideration when updating and improving the dataset.
When using this dataset or one of its updates, please cite the DOI of the precise version of the dataset used and also the data description article which this dataset is supplement to (see above). Please consider also citing the relevant original sources when using the PRIMAP-hist dataset. See the full citations in the References section further below.
Since version 2.3 we use the data formats developed for the PRIMAP2 climate policy analysis suite: PRIMAP2 on GitHub. The data are published both in the interchange format which consists of a csv file with the data and a yaml file with additional metadata and the native NetCDF based format. For a detailed description of the data format we refer to the PRIMAP2 documentation.
We have also included files with more than three significant digits. These files are mainly aimed at people doing policy analysis using the country reported data scenario (HISTCR). Using the high precision data they can avoid questions on discrepancies with the reported data. The uncertainties of emissions data do not justify the additional significant digits and they might give a false sense of accuracy, so please use this version of the dataset with extra care.
Support
If you encounter possible errors or other things that should be noted, please check our issue tracker at github.com/JGuetschow/PRIMAP-hist and report your findings there. Please use the tag "v2.6" in any issue you create regarding this dataset.
If you need support in using the dataset or have any other questions regarding the dataset, please contact johannes.guetschow@climate-resource.com.
Climate Resource makes this data available CC BY 4.0 licence. Free support is limited to simple questions and non-commercial users. We also provide additional data, and data support services to clients wanting more frequent updates, additional metadata or to integrate these datasets into their workflows. Get in touch at contact@climate-resource.com if you are interested.
Sources
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This resource was created by publishing the backend database of the Belgian Species List website (www.species.be). This publishing work has been done by the Belgian Biodiversity Platform (http://www.biodiversity.be) in the framework of the "GBIF award for evaluating checklist publication format" during May 2011.
Data extracted by Francis Strobbe on May 27th 2011.
Abstract of the Belgian Species List project: Since almost a year now, people of all ages can access the Belgian Species List for an extensive overview of the biodiversity in Belgium. Animals, plants, fungi: you can make your way through a total of more than 32.000 species!
The Belgian species list was set up by the RBINS – the research institution behind the Museum of Natural Sciences – in cooperation with different Belgian and international institutions and organizations.
The purpose of the species list is to become the online reference in naming and occurrence of species in Belgium, easy to consult in just one website that centralizes and standardizes the information. For every species that's been described, information is given on among other things the conservation status, the distribution, the habitat and much more. More and more species index cards are fitted with pictures and interesting links.
The website is aimed at a broad audience: researchers, decision makers, students, journalists, nature conversationists and all nature lovers.
Want to know more? Log on to www.species.be!
API operated by Louisville Metro that returns AQI information from local sensors operated by APCD. Shows the latest hourly data in a JSON feed.The Air Quality Index (AQI) is an easy way to tell you about air quality without having to know a lot of technical details. The “Metropolitan Air Quality Index” shows the AQI from the monitor in Kentuckiana that is currently detecting the highest level of air pollution. See: https://louisvilleky.gov/government/air-pollution-control-district/servi...See the air quality map (Louisville Air Watch) for more details: airqualitymap.louisvilleky.gov/#Read the FAQ for more information about the AQI data: https://louisvilleky.gov/government/air-pollution-control-district/louis...If you'd prefer air quality forecast data (raw data, maps, API) instead, please see AIRNow: https://www.airnow.gov/index.cfm?action=airnow.local_city&zipcode=40204&...See the Data Dictionary section below for information about what the AQI numbers mean, their corresponding colors, recommendations, and more info and links.To download daily snapshots of AQI for the last 25 years, visit the EPA website, set your year range, and choose, Louisville KY. Then download with the CSV link at the bottom of the page.IFTTT integration trigger that fires and after retrieving air quality from Louisville Metro air sensors via the APIGives a forecast instead of the current conditions, so you can take action before the air quality gets bad.The U.S. EPA AirNow program (www.AirNow.gov) protects public health by providing forecast and real-time observed air quality information across the United States, Canada, and Mexico. AirNow receives real-time air quality observations from over 2,000 monitoring stations and collects forecasts for more than 300 cities.Sign up for a free account and get started using the RSS data feed for Louisville. https://docs.airnowapi.org/feedsAir Quality Forecast via AirNowAQI Level - Value and Related Health Concerns LegendGood 0-50 GreenAir quality is considered satisfactory, and air pollution poses little or no risk.Moderate 51-100 YellowAir quality is acceptable; however, for some pollutants there may be a moderate health concern for a very small number of people who are unusually sensitive to air pollution.Unhealthy for Sensitive Groups 101-150 OrangeMembers of sensitive groups may experience health effects. The general public is not likely to be affected.Unhealthy 151-200 RedEveryone may begin to experience health effects; members of sensitive groups may experience more serious health effects.Very Unhealthy 201-300 PurpleHealth alert: everyone may experience more serious health effects.Hazardous > 300 Dark PurpleHealth warnings of emergency conditions. The entire population is more likely to be affected.Here are citizen actions APCD recommends on air quality alert days, that is, days when the forecast is for the air quality to reach or exceed the “unhealthy for sensitive groups” (orange) level:Don’t idle your car. (Recommended all the time; see the second link below.)Put off mowing grass with a gas mower until the alert ends.“Refuel when it’s cool” (pump gasoline only in the evening or night).Avoid driving if possible. Share rides or take TARC.Check on neighbors with breathing problems.Here are some links in relation to the recommendations:KAIRE, www.helptheair.org/Idle Free Louisville, www.helptheair.org/idle-freeTARCTicket to Ride, tickettoride.org/Lawn Care for Cleaner Air (rebates)Contact:Bryan FrazerBryan.Frazar@louisvilleky.gov
Abstract copyright UK Data Service and data collection copyright owner.The English Longitudinal Study of Ageing (ELSA) study is a longitudinal survey of ageing and quality of life among older people that explores the dynamic relationships between health and functioning, social networks and participation, and economic position as people plan for, move into and progress beyond retirement. The main objectives of ELSA are to:construct waves of accessible and well-documented panel data;provide these data in a convenient and timely fashion to the scientific and policy research community;describe health trajectories, disability and healthy life expectancy in a representative sample of the English population aged 50 and over;examine the relationship between economic position and health;investigate the determinants of economic position in older age;describe the timing of retirement and post-retirement labour market activity; andunderstand the relationships between social support, household structure and the transfer of assets.Further information may be found on the the ELSA project website or the Natcen Social Research: ELSA web pages.Health conditions research with ELSA - June 2021 The ELSA Data team have found some issues with historical data measuring health conditions. If you are intending to do any analysis looking at the following health conditions, then please contact elsadata@natcen.ac.uk for advice on how you should approach your analysis. The affected conditions are: eye conditions (glaucoma; diabetic eye disease; macular degeneration; cataract), CVD conditions (high blood pressure; angina; heart attack; Congestive Heart Failure; heart murmur; abnormal heart rhythm; diabetes; stroke; high cholesterol; other heart trouble) and chronic health conditions (chronic lung disease; asthma; arthritis; osteoporosis; cancer; Parkinson's Disease; emotional, nervous or psychiatric problems; Alzheimer's Disease; dementia; malignant blood disorder; multiple sclerosis or motor neurone disease).Secure Access Data:Secure Access versions of ELSA have more restrictive access conditions than versions available under the standard End User Licence or Special Licence (see 'Access' section below).Secure Access versions of ELSA include:Primary Data from Wave 8 onwards (SN 8444) includes all the variables in the SL primary dataset (SN 8346) as well as day of birth, combined SIC 2003 code (5 digit), combined SOC 2000 code (4 digit), NS-SEC long version including and excluding unclassifiable and non-workers.Pension Age Data from Wave 8 onwards (SN 8445) includes all the variables in the SL pension age data (SN 8375) as well as year reached pension age variable.Detailed geographical identifier files for each wave, grouped by identifier held under SN 8423 (Index of Multiple Deprivation Score), SN 8424 (Local Authority District Pre-2009 Boundaries), SN 8438 (Local Authority District Post-2009 Boundaries), SN 8425 (Census 2001 Lower Layer Super Output Areas), SN 8434 (Census 2011 Lower Layer Super Output Areas), SN 8426(Census 2001 Middle Layer Super Output Areas), SN 8435 (Census 2011 Middle Layer Super Output Areas), SN 8427 (Population Density for Postcode Sectors), SN 8428 (Census 2001 Rural-Urban Indicators), SN 8436 (Census 2011 Rural-Urban Indicators).Where boundary changes have occurred, the geographic identifier has been split into two separate studies to reduce the risk of disclosure. Users are also only allowed one version of each identifier:either SN 8424 (Local Authority District Pre-2009 Boundaries) or SN 8438 (Local Authority District Post-2009 Boundaries)either SN 8425 (Census 2001 Lower Layer Super Output Areas) or SN 8434 (Census 2011 Lower Layer Super Output Areas)either SN 8426 (Census 2001 Middle Layer Super Output Areas) or SN 8435 (Census 2011 Middle Layer Super Output Areas)either SN 8428 (Census 2001 Rural-Urban Indicators) or SN 8436 (Census 2011 Rural-Urban Indicators)
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
A dataset providing information about local council services in Leeds. Leeds City Council uses this information to populate the Knowledge Panels on the Google search website. The dataset includes type of service, contact information and opening times. What is a Knowledge Panel? When people search for a business on Google, they may see information about that business in a box that appears to the right of their search results. The information in the box, called the Knowledge Panel, can help customers discover and contact your business. Is the information correct? If you spot any information which you believe to be incorrect please contact us on webmaster@leeds.gov.uk . We can then investigate this and update this dataset and the Google Knowledge Panel. Automated update This dataset is automatically updated on a fortnightly basis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Non-Hispanic population of Red Bank by race. It includes the distribution of the Non-Hispanic population of Red Bank across various race categories as identified by the Census Bureau. The dataset can be utilized to understand the Non-Hispanic population distribution of Red Bank across relevant racial categories.
Key observations
Of the Non-Hispanic population in Red Bank, the largest racial group is White alone with a population of 7,497 (84.45% of the total Non-Hispanic population).
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Red Bank Population by Race & Ethnicity. You can refer the same here
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
There are a lot of unknowns when running an E-commerce store, even when you have analytics to guide your decisions.
Users are an important factor in an e-commerce business. This is especially true in a C2C-oriented store, since they are both the suppliers (by uploading their products) AND the customers (by purchasing other user's articles).
This dataset aims to serve as a benchmark for an e-commerce fashion store. Using this dataset, you may want to try and understand what you can expect of your users and determine in advance how your grows may be.
If you think this kind of dataset may be useful or if you liked it, don't forget to show your support or appreciation with an upvote/comment. You may even include how you think this dataset might be of use to you. This way, I will be more aware of specific needs and be able to adapt my datasets to suits more your needs.
This dataset is part of a preview of a much larger dataset. Please contact me for more.
What is inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
The data was scraped from a successful online C2C fashion store with over 9M registered users. The store was first launched in Europe around 2009 then expanded worldwide.
Visitors vs Users: Visitors do not appear in this dataset. Only registered users are included. "Visitors" cannot purchase an article but can view the catalog.
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Questions you might want to answer using this dataset:
For other licensing options, contact me.