The Arlington Profile combines countywide data sources and provides a comprehensive outlook of the most current data on population, housing, employment, development, transportation, and community services. These datasets are used to obtain an understanding of community, plan future services/needs, guide policy decisions, and secure grant funding. A PDF Version of the Arlington Profile can be accessed on the Arlington County website.
How many people use social media?
Social media usage is one of the most popular online activities. In 2024, over five billion people were using social media worldwide, a number projected to increase to over six billion in 2028.
Who uses social media?
Social networking is one of the most popular digital activities worldwide and it is no surprise that social networking penetration across all regions is constantly increasing. As of January 2023, the global social media usage rate stood at 59 percent. This figure is anticipated to grow as lesser developed digital markets catch up with other regions
when it comes to infrastructure development and the availability of cheap mobile devices. In fact, most of social media’s global growth is driven by the increasing usage of mobile devices. Mobile-first market Eastern Asia topped the global ranking of mobile social networking penetration, followed by established digital powerhouses such as the Americas and Northern Europe.
How much time do people spend on social media?
Social media is an integral part of daily internet usage. On average, internet users spend 151 minutes per day on social media and messaging apps, an increase of 40 minutes since 2015. On average, internet users in Latin America had the highest average time spent per day on social media.
What are the most popular social media platforms?
Market leader Facebook was the first social network to surpass one billion registered accounts and currently boasts approximately 2.9 billion monthly active users, making it the most popular social network worldwide. In June 2023, the top social media apps in the Apple App Store included mobile messaging apps WhatsApp and Telegram Messenger, as well as the ever-popular app version of Facebook.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The United States Census is a decennial census mandated by Article I, Section 2 of the United States Constitution, which states: "Representatives and direct Taxes shall be apportioned among the several States ... according to their respective Numbers."
Source: https://en.wikipedia.org/wiki/United_States_Census
The United States census count (also known as the Decennial Census of Population and Housing) is a count of every resident of the US. The census occurs every 10 years and is conducted by the United States Census Bureau. Census data is publicly available through the census website, but much of the data is available in summarized data and graphs. The raw data is often difficult to obtain, is typically divided by region, and it must be processed and combined to provide information about the nation as a whole.
The United States census dataset includes nationwide population counts from the 2000 and 2010 censuses. Data is broken out by gender, age and location using zip code tabular areas (ZCTAs) and GEOIDs. ZCTAs are generalized representations of zip codes, and often, though not always, are the same as the zip code for an area. GEOIDs are numeric codes that uniquely identify all administrative, legal, and statistical geographic areas for which the Census Bureau tabulates data. GEOIDs are useful for correlating census data with other censuses and surveys.
Fork this kernel to get started.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:census_bureau_usa
https://cloud.google.com/bigquery/public-data/us-census
Dataset Source: United States Census Bureau
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by Steve Richey from Unsplash.
What are the ten most populous zip codes in the US in the 2010 census?
What are the top 10 zip codes that experienced the greatest change in population between the 2000 and 2010 censuses?
https://cloud.google.com/bigquery/images/census-population-map.png" alt="https://cloud.google.com/bigquery/images/census-population-map.png">
https://cloud.google.com/bigquery/images/census-population-map.png
The data has been acquired from yelp website.
The data can help people find companies/organizations with respect to ratings and reviews. This can help people to choose or recommend best services out there.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
Motivation: creating challenging dataset for testing Named-Entity
Linking. The Namesakes dataset consists of three closely related datasets: Entities, News and Backlinks. Entities were collected as Wikipedia text chunks corresponding to highly ambiguous entity names. The News were collected as random news text chunks, containing mentions that either belong to the Entities dataset or can be easily confused with them. Backlinks were obtained from Wikipedia dump data with intention to have mentions linked to the entities of the Entity dataset. The Entities and News are human-labeled, resolving the mentions of the entities.Methods
Entities were collected as Wikipedia
text chunks corresponding to highly ambiguous entity names: the most popular people names, the most popular locations, and organizations with name ambiguity. In each Entities text chunk, the named entities with the name similar to the chunk Wikipedia page name are labeled. For labeling, these entities were suggested to human annotators (odetta.ai) to tag as "Same" (same as the page entity) or "Other". The labeling was done by 6 experienced annotators that passed through a preliminary trial task. The only accepted tags are the tags assigned in agreement by not less than 5 annotators, and then passed through reconciliation with an experienced reconciliator.
The News were collected as random news text chunks, containing mentions which either belong to the Entities dataset or can be easily confused with them. In each News text chunk one mention was selected for labeling, and 3-10 Wikipedia pages from Entities were suggested as the labels for an annotator to choose from. The labeling was done by 3 experienced annotators (odetta.ai), after the annotators passed a preliminary trial task. The results were reconciled by an experienced reconciliator. All the labeling was done using Lighttag (lighttag.io).
Backlinks were obtained from Wikipedia dump data (dumps.wikimedia.org/enwiki/20210701) with intention to have mentions linked to the entities of the Entity dataset. The backlinks were filtered to leave only mentions in a good quality text; each text was cut 1000 characters after the last mention.
Usage NotesEntities:
File: Namesakes_entities.jsonl The Entities dataset consists of 4148 Wikipedia text chunks containing human-tagged mentions of entities. Each mention is tagged either as "Same" (meaning that the mention is of this Wikipedia page entity), or "Other" (meaning that the mention is of some other entity, just having the same or similar name). The Entities dataset is a jsonl list, each item is a dictionary with the following keys and values: Key: ‘pagename’: page name of the Wikipedia page. Key ‘pageid’: page id of the Wikipedia page. Key ‘title’: title of the Wikipedia page. Key ‘url’: URL of the Wikipedia page. Key ‘text’: The text chunk from the Wikipedia page. Key ‘entities’: list of the mentions in the page text, each entity is represented by a dictionary with the keys: Key 'text': the mention as a string from the page text. Key ‘start’: start character position of the entity in the text. Key ‘end’: end (one-past-last) character position of the entity in the text. Key ‘tag’: annotation tag given as a string - either ‘Same’ or ‘Other’.
News: File: Namesakes_news.jsonl The News dataset consists of 1000 news text chunks, each one with a single annotated entity mention. The annotation either points to the corresponding entity from the Entities dataset (if the mention is of that entity), or indicates that the mentioned entity does not belong to the Entities dataset. The News dataset is a jsonl list, each item is a dictionary with the following keys and values: Key ‘id_text’: Id of the sample. Key ‘text’: The text chunk. Key ‘urls’: List of URLs of wikipedia entities suggested to labelers for identification of the entity mentioned in the text. Key ‘entity’: a dictionary describing the annotated entity mention in the text: Key 'text': the mention as a string found by an NER model in the text. Key ‘start’: start character position of the mention in the text. Key ‘end’: end (one-past-last) character position of the mention in the text. Key 'tag': This key exists only if the mentioned entity is annotated as belonging to the Entities dataset - if so, the value is a dictionary identifying the Wikipedia page assigned by annotators to the mentioned entity: Key ‘pageid’: Wikipedia page id. Key ‘pagetitle’: page title. Key 'url': page URL.
Backlinks dataset: The Backlinks dataset consists of two parts: dictionary Entity-to-Backlinks and Backlinks documents. The dictionary points to backlinks for each entity of the Entity dataset (if any backlinks exist for the entity). The Backlinks documents are the backlinks Wikipedia text chunks with identified mentions of the entities from the Entities dataset.
Each mention is identified by surrounded double square brackets, e.g. "Muir built a small cabin along [[Yosemite Creek]].". However, if the mention differs from the exact entity name, the double square brackets wrap both the exact name and, separated by '|', the mention string to the right, for example: "Muir also spent time with photographer [[Carleton E. Watkins | Carleton Watkins]] and studied his photographs of Yosemite.".
The Entity-to-Backlinks is a jsonl with 1527 items. File: Namesakes_backlinks_entities.jsonl Each item is a tuple: Entity name. Entity Wikipedia page id. Backlinks ids: a list of pageids of backlink documents.
The Backlinks documents is a jsonl with 26903 items. File: Namesakes_backlinks_texts.jsonl Each item is a dictionary: Key ‘pageid’: Id of the Wikipedia page. Key ‘title’: Title of the Wikipedia page. Key 'content': Text chunk from the Wikipedia page, with all mentions in the double brackets; the text is cut 1000 characters after the last mention, the cut is denoted as '...[CUT]'. Key 'mentions': List of the mentions from the text, for convenience. Each mention is a tuple: Entity name. Entity Wikipedia page id. Sorted list of all character indexes at which the mention occurrences start in the text.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Page by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Page. The dataset can be utilized to understand the population distribution of Page by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Page. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Page.
Key observations
Largest age group (population): Male # 5-9 years (505) | Female # 5-9 years (466). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Page Population by Gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository was created for my Master's thesis in Computational Intelligence and Internet of Things at the University of Córdoba, Spain. The purpose of this repository is to store the datasets found that were used in some of the studies that served as research material for this Master's thesis. Also, the datasets used in the experimental part of this work are included.
Below are the datasets specified, along with the details of their references, authors, and download sources.
----------- STS-Gold Dataset ----------------
The dataset consists of 2026 tweets. The file consists of 3 columns: id, polarity, and tweet. The three columns denote the unique id, polarity index of the text and the tweet text respectively.
Reference: Saif, H., Fernandez, M., He, Y., & Alani, H. (2013). Evaluation datasets for Twitter sentiment analysis: a survey and a new dataset, the STS-Gold.
File name: sts_gold_tweet.csv
----------- Amazon Sales Dataset ----------------
This dataset is having the data of 1K+ Amazon Product's Ratings and Reviews as per their details listed on the official website of Amazon. The data was scraped in the month of January 2023 from the Official Website of Amazon.
Owner: Karkavelraja J., Postgraduate student at Puducherry Technological University (Puducherry, Puducherry, India)
Features:
License: CC BY-NC-SA 4.0
File name: amazon.csv
----------- Rotten Tomatoes Reviews Dataset ----------------
This rating inference dataset is a sentiment classification dataset, containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews. On average, these reviews consist of 21 words. The first 5331 rows contains only negative samples and the last 5331 rows contain only positive samples, thus the data should be shuffled before usage.
This data is collected from https://www.cs.cornell.edu/people/pabo/movie-review-data/ as a txt file and converted into a csv file. The file consists of 2 columns: reviews and labels (1 for fresh (good) and 0 for rotten (bad)).
Reference: Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pages 115–124, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
File name: data_rt.csv
----------- Preprocessed Dataset Sentiment Analysis ----------------
Preprocessed amazon product review data of Gen3EcoDot (Alexa) scrapped entirely from amazon.in
Stemmed and lemmatized using nltk.
Sentiment labels are generated using TextBlob polarity scores.
The file consists of 4 columns: index, review (stemmed and lemmatized review using nltk), polarity (score) and division (categorical label generated using polarity score).
DOI: 10.34740/kaggle/dsv/3877817
Citation: @misc{pradeesh arumadi_2022, title={Preprocessed Dataset Sentiment Analysis}, url={https://www.kaggle.com/dsv/3877817}, DOI={10.34740/KAGGLE/DSV/3877817}, publisher={Kaggle}, author={Pradeesh Arumadi}, year={2022} }
This dataset was used in the experimental phase of my research.
File name: EcoPreprocessed.csv
----------- Amazon Earphones Reviews ----------------
This dataset consists of a 9930 Amazon reviews, star ratings, for 10 latest (as of mid-2019) bluetooth earphone devices for learning how to train Machine for sentiment analysis.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 5 columns: ReviewTitle, ReviewBody, ReviewStar, Product and division (manually added - categorical label generated using ReviewStar score)
License: U.S. Government Works
Source: www.amazon.in
File name (original): AllProductReviews.csv (contains 14337 reviews)
File name (edited - used for my research) : AllProductReviews2.csv (contains 9930 reviews)
----------- Amazon Musical Instruments Reviews ----------------
This dataset contains 7137 comments/reviews of different musical instruments coming from Amazon.
This dataset was employed in the experimental phase of my research. To align it with the objectives of my study, certain reviews were excluded from the original dataset, and an additional column was incorporated into this dataset.
The file consists of 10 columns: reviewerID, asin (ID of the product), reviewerName, helpful (helpfulness rating of the review), reviewText, overall (rating of the product), summary (summary of the review), unixReviewTime (time of the review - unix time), reviewTime (time of the review (raw) and division (manually added - categorical label generated using overall score).
Source: http://jmcauley.ucsd.edu/data/amazon/
File name (original): Musical_instruments_reviews.csv (contains 10261 reviews)
File name (edited - used for my research) : Musical_instruments_reviews2.csv (contains 7137 reviews)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Page township by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Page township. The dataset can be utilized to understand the population distribution of Page township by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Page township. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Page township.
Key observations
Largest age group (population): Male # 10-14 years (75) | Female # 10-14 years (72). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Page township Population by Gender. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the ground truth data used to evaluate the musical pitch, tempo and key estimation algorithms developed during the AudioCommons H2020 EU project and which are part of the Audio Commons Audio Extractor tool. It also includes ground truth information for the single-eventness audio descriptor also developed for the same tool. This ground truth data has been used to generate the following documents: Deliverable D4.4: Evaluation report on the first prototype tool for the automatic semantic description of music samples Deliverable D4.10: Evaluation report on the second prototype tool for the automatic semantic description of music samples Deliverable D4.12: Release of tool for the automatic semantic description of music samples All these documents are available in the materials section of the AudioCommons website. All ground truth data in this repository is provided in the form of CSV files. Each CSV file corresponds to one of the individual datasets used in one or more evaluation tasks of the aforementioned deliverables. This repository does not include the audio files of each individual dataset, but includes references to the audio files. The following paragraphs describe the structure of the CSV files and give some notes about how to obtain the audio files in case these would be needed. Structure of the CSV files All CSV files in this repository (with the sole exception of SINGLE EVENT - Ground Truth.csv) feature the following 5 columns: Audio reference: reference to the corresponding audio file. This will either be a string withe the filename, or the Freesound ID (for one dataset based on Freesound content). See below for details about how to obtain those files. Audio reference type: will be one of Filename or Freesound ID, and specifies how the previous column should be interpreted. Key annotation: tonality information as a string with the form "RootNote minor/major". Audio files with no ground truth annotation for tonality are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. Tempo annotation: tempo information as an integer representing beats per minute. Audio files with no ground truth annotation for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. Note that integer values are used here because we only have tempo annotations for music loops which typically only feature integer tempo values. Pitch annotation: pitch information as an integer representing the MIDI note number corresponding to annotated pitch's frequency. Audio files with no ground truth pitch for tempo are left blank. Ground truth annotations are parsed from the original data source as described in the text of deliverables D4.4 and D4.10. The remaining CSV file, SINGLE EVENT - Ground Truth.csv, has only the following 2 columns: Freesound ID: sound ID used in Freesound to identify the audio clip. Single Event: boolean indicating whether the corresponding sound is considered to be a single event or not. Single event annotations were collected by the authors of the deliverables as described in deliverable D4.10. How to get the audio data In this section we provide some notes about how to obtain the audio files corresponding to the ground truth annotations provided here. Note that due to licensing restrictions we are not allowed to re-distribute the audio data corresponding to most of these ground truth annotations. Apple Loops (APPL): This dataset includes some of the music loops included in Apple's music software such as Logic or GarageBand. Access to these loops requires owning a license for the software. Detailed instructions about how to set up this dataset are provided here. Carlos Vaquero Instruments Dataset (CVAQ): This dataset includes single instrument recordings carried out by Carlos Vaquero as part of this master thesis. Sounds are available as Freesound packs and can be downloaded at this page: https://freesound.org/people/Carlos_Vaquero/packs Freesound Loops 4k (FSL4): This dataset set includes a selection of music loops taken from Freesound. Detailed instructions about how to set up this dataset are provided here. Giant Steps Key Dataset (GSKY): This dataset includes a selection of previews from Beatport annotated by key. Audio and original annotations available here. Good-sounds Dataset (GSND): This dataset contains monophonic recordings of instrument samples. Full description, original annotations and audio are available here. University of IOWA Musical Instrument Samples (IOWA): This dataset was created by the Electronic Music Studios of the University of IOWA and contains recordings of instrument samples. The dataset is available upon request by visiting this website. Mixcraft Loops (MIXL): This dataset includes some of the music loops included in Acoustica's Mixcraft music software. Access to these loops requires owning
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
THIS IS STILL WIP, PLEASE DO NOT CIRCULATE
About This dataset contains counts of (referer, article) pairs extracted from the request logs of English Wikipedia. When a client requests a resource by following a link or performing a search, the URI of the webpage that linked to the resource is included in the request in an HTTP header called the "referer". This data captures 22 million (referer, article) pairs from a total of 4 billion requests collected during the month of January 2015. Data Preparation- The dataset only includes requests to articles in the main namespace of the desktop version of English Wikipedia (see https://en.wikipedia.org/wiki/Wikipedia:Namespace) - Requests to MediaWiki redirects are excluded - Spider traffic was excluded using the ua-parser library (https://github.com/tobie/ua-parser) - Referers were mapped to a fixed set of values corresponding to internal traffic or external traffic from one of the top 5 global traffic sources of English Wikipedia, based on this scheme: - an article in the main namespace of English Wikipedia -> the article title - any Wikipedia page that is not in the main namespace of English Wikipedia -> 'other-wikipedia' - an empty referer -> 'other-empty' - a page from any other Wikimedia project -> 'other-internal' - Google -> 'other-google' - Yahoo -> 'other-yahoo' - Bing -> 'other-bing' - Facebook -> 'other-facebook' - Twitter -> 'other-twitter' - anything else -> 'other' For the exact mapping see https://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql#L30-L48 - (referer, article) pairs with 10 or fewer observations were removed from the dataset Note: When a user requests a page through the search bar, the page the user searched from is listed as a referer. Hence, the data contains '(referer, article)' pairs for which the referer does not contain a link to the article. For an example, consider the '(Wikipedia, Chris_Kyle)' pair. Users went to the 'Wikipedia' article to search for Chris Kyle within English Wikipedia. ApplicationsThis data can be used for various purposes: - determining the most frequent links people click on for a given article- determining the most common links people followed to an article- determining how much of the total traffic to an article clicked on a link in that article- generating a Markov chain over English Wikipedia Format:- prev_id: if the referer does not correspond to an article in the main namespace of English Wikipedia, this value will be empty. Otherwise, it contains the unique MediaWiki page ID of the article corresponding to the referer i.e. the previous article the client was on- curr_id: the MediaWiki unique page ID of the article the client requested- n: the number of occurrences of the '(referer, article)' pair- prev_title: the result of mapping the referer URL to the fixed set of values described above- curr_title: the title of the article the client requested
LicenseAll files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/ Source codehttps://github.com/ewulczyn/wmf/blob/master/mc/oozie/hive_query.sql (MIT license)
In a previous paper (AMI Consortium 2011, MNRAS, 415, 2699: Paper I), the observational, mapping and source-extraction techniques used for the Tenth Cambridge (10C) Survey of Radio Sources were described. Here, the first results from the survey, carried out using the Arcminute Microkelvin Imager (AMI) Large Array (LA) at an observing frequency of 15.7 GHz, are presented. The survey fields cover an area of ~ 27 deg2 to a flux-density completeness of 1 mJy. Results for some deeper areas, covering ~ 12 deg2, which are wholly contained within the total areas and complete to 0.5 mJy, are also presented. The completeness for both areas is estimated to be at least 93 per cent. The 10C survey is the deepest radio survey of any significant extent (>~ 0.2 deg2) above 1.4 GHz. The 10C source catalogue contains 1897 entries detected above a flux density threshold of > 4.62 sigma, and is available here and at the authors' web site http://www.mrao.cam.ac.uk/surveys/10C. The source catalog has been combined with that of the Ninth Cambridge Survey to calculate the 15.7-GHz source counts. A broken power law is found to provide a good parametrization of the differential count between 0.5 mJy and 1 Jy. The measured source count has been compared with that predicted by de Zotti et al. (2005, A&A, 431, 893, and the model is found to display good agreement with the data at the highest flux densities. However, over the entire flux-density range of the measured count (0.5 mJy to 1 Jy), the model is found to underpredict the integrated count by ~ 30 per cent. Entries from the source catalog have been matched with those contained in the catalogues of the NRAO VLA Sky Survey and the Faint Images of the Radio Sky at Twenty-cm survey (both of which have observing frequencies of 1.4 GHz). This matching provides evidence for a shift in the typical 1.4-GHz spectral index to 15.7-GHz spectral index of the 15.7-GHz-selected source population with decreasing flux density towards sub-mJy levels - the spectra tend to become less steep. Automated methods for detecting extended sources, developed in Paper I, have been applied to the data; ~ 5 per cent of the sources are found to be extended relative to the LA-synthesized beam of ~ 30 arcsec. Investigations using higher resolution data showed that most of the genuinely extended sources at 15.7 GHz are classical doubles, although some nearby galaxies and twin-jet sources were also identified. This table was created by the HEASARC in August 2011 based on an electronic version of Table 1 of the reference paper which was obtained from the 10C Survey web site http://www.mrao.cam.ac.uk/surveys/10C/. This is a service provided by NASA HEASARC .
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of new Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. This dataset will receive a final update on June 1, 2023, to reconcile historical data through May 10, 2023, and will remain publicly available.
Aggregate Data Collection Process Since the start of the COVID-19 pandemic, data have been gathered through a robust process with the following steps:
Methodology Changes Several differences exist between the current, weekly-updated dataset and the archived version:
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions report probable cases and deaths to CDC.* Confirmed and probable case definition criteria are described here:
Council of State and Territorial Epidemiologists (ymaws.com).
Deaths CDC reports death data on other sections of the website: CDC COVID Data Tracker: Home, CDC COVID Data Tracker: Cases, Deaths, and Testing, and NCHS Provisional Death Counts. Information presented on the COVID Data Tracker pages is based on the same source (total case counts) as the present dataset; however, NCHS Death Counts are based on death certificates that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Data from each of these pages are considered provisional (not complete and pending verification) and are therefore subject to change. Counts from previous weeks are continually revised as more records are received and processed.
Number of Jurisdictions Reporting There are currently 60 public health jurisdictions reporting cases of COVID-19. This includes the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. New York State’s reported case and death counts do not include New York City’s counts as they separately report nationally notifiable conditions to CDC.
CDC COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths, available by state and by county. These and other data on COVID-19 are available from multiple public locations, such as:
https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/cases-in-us.html
https://www.cdc.gov/covid-data-tracker/index.html
https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html
https://www.cdc.gov/coronavirus/2019-ncov/php/open-america/surveillance-data-analytics.html
Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.
Archived Data Notes:
November 3, 2022: Due to a reporting cadence issue, case rates for Missouri counties are calculated based on 11 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 3, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Due to a reporting cadence change, case rates for Alabama counties are calculated based on 13 days’ worth of case count data in the Weekly United States COVID-19 Cases and Deaths by State data released on November 10, 2022, instead of the customary 7 days’ worth of data.
November 10, 2022: Per the request of the jurisdiction, cases and deaths among non-residents have been removed from all Hawaii county totals throughout the entire time series. Cumulative case and death counts reported by CDC will no longer match Hawaii’s COVID-19 Dashboard, which still includes non-resident cases and deaths.
November 17, 2022: Two new columns, weekly historic cases and weekly historic deaths, were added to this dataset on November 17, 2022. These columns reflect case and death counts that were reported that week but were historical in nature and not reflective of the current burden within the jurisdiction. These historical cases and deaths are not included in the new weekly case and new weekly death columns; however, they are reflected in the cumulative totals provided for each jurisdiction. These data are used to account for artificial increases in case and death totals due to batched reporting of historical data.
December 1, 2022: Due to cadence changes over the Thanksgiving holiday, case rates for all Ohio counties are reported as 0 in the data released on December 1, 2022.
January 5, 2023: Due to North Carolina’s holiday reporting cadence, aggregate case and death data will contain 14 days’ worth of data instead of the customary 7 days. As a result, case and death metrics will appear higher than expected in the January 5, 2023, weekly release.
January 12, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0. As a result, case and death metrics will appear lower than expected in the January 12, 2023, weekly release.
January 19, 2023: Due to a reporting cadence issue, Mississippi’s aggregate case and death data will be calculated based on 14 days’ worth of data instead of the customary 7 days in the January 19, 2023, weekly release.
January 26, 2023: Due to a reporting backlog of historic COVID-19 cases, case rates for two Michigan counties (Livingston and Washtenaw) were higher than expected in the January 19, 2023 weekly release.
January 26, 2023: Due to a backlog of historic COVID-19 cases being reported this week, aggregate case and death counts in Charlotte County and Sarasota County, Florida, will appear higher than expected in the January 26, 2023 weekly release.
January 26, 2023: Due to data processing delays, Mississippi’s aggregate case and death data will be reported as 0 in the weekly release posted on January 26, 2023.
February 2, 2023: As of the data collection deadline, CDC observed an abnormally large increase in aggregate COVID-19 cases and deaths reported for Washington State. In response, totals for new cases and new deaths released on February 2, 2023, have been displayed as zero at the state level until the issue is addressed with state officials. CDC is working with state officials to address the issue.
February 2, 2023: Due to a decrease reported in cumulative case counts by Wyoming, case rates will be reported as 0 in the February 2, 2023, weekly release. CDC is working with state officials to verify the data submitted.
February 16, 2023: Due to data processing delays, Utah’s aggregate case and death data will be reported as 0 in the weekly release posted on February 16, 2023. As a result, case and death metrics will appear lower than expected and should be interpreted with caution.
February 16, 2023: Due to a reporting cadence change, Maine’s
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides a comprehensive overview of online sales transactions across different product categories. Each row represents a single transaction with detailed information such as the order ID, date, category, product name, quantity sold, unit price, total price, region, and payment method.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Policies requiring biodiversity no net loss or net gain as an outcome of environmental planning have become more prominent worldwide, catalysing interest in biodiversity offsetting as a mechanism to compensate for development impacts on nature. Offsets rely on credible and evidence-based methods to quantify biodiversity losses and gains. Following the introduction of the United Kingdom’s Environment Act in November 2021, all new developments requiring planning permission in England are expected to demonstrate a 10% biodiversity net gain from 2024, calculated using the statutory biodiversity metric framework (Defra, 2023). The metric is used to calculate both baseline and proposed post-development biodiversity units, and is set to play an increasingly prominent role in nature conservation nationwide. The metric has so far received limited scientific scrutiny. This dataset comprises a database of statutory biodiversity metric unit values for terrestrial habitat samples across England. For each habitat sample, we present biodiversity units alongside five long-established single-attribute proxies for biodiversity (species richness, individual abundance, number of threatened species, mean species range or population, mean species range or population change). Data were compiled for species from three taxa (vascular plants, butterflies, birds), from sites across England. The dataset includes 24 sites within grassland, wetland, woodland and forest, sparsely vegetated land, cropland, heathland and shrub, i.e. all terrestrial broad habitats except urban and individual trees. Species data were reused from long-term ecological change monitoring datasets (mostly in the public domain), whilst biodiversity units were calculated following field visits. Fieldwork was carried out in April-October 2022 to calculate biodiversity units for the samples. Sites were initially assessed using metric version 3.1, which was current at the time of survey, and were subsequently updated to the statutory metric for analysis using field notes and species data. Species data were derived from 24 long-term ecological change monitoring sites across the Environmental Change Network (ECN), Long Term Monitoring Network (LTMN) and Ecological Continuity Trust (ECT), collected between 2010 and 2020. Methods Study sites We studied 24 sites across the Environmental Change Network (ECN), Long Term Monitoring Network (LTMN) and Ecological Continuity Trust (ECT). Biodiversity units were calculated following field visits by the authors, whilst species data (response variables) were derived from long-term ecological change monitoring datasets collected by the sites and mostly held in the public domain (Table S1). We used all seven ECN sites in England. We selected a complementary 13 LTMN sites to give good geographic and habitat representation across England. We included four datasets from sites supported by the ECT where 2 x 2m vascular plant quadrat data were available for reuse. The 24 sites included samples from all terrestrial broad habitats (sensu Defra 2023) in England, except urban and individual trees: grassland (8), wetland (6), woodland and forest (5), sparsely vegetated land (2), cropland (2), heathland and shrub (1). Non-terrestrial broad habitats (rivers and lakes, marine inlets and transitional waters) were excluded. Our samples ranged in biodiversity unit scores from 2 to 24, the full range of the metric. Not all 24 sites had long-term datasets from all taxa: 23 had vascular plant data, 8 had bird data, and 13 had butterfly data. We chose these three taxa as they are the most comprehensively surveyed taxa in England’s long-term biological datasets. Together they represent a taxonomically broad, although by no means representative, sample of English nature. Biodiversity unit calculation Baseline biodiversity units were attributed to each vegetation quadrat using the statutory biodiversity metric (Defra, 2023) (Equation 1). Sites were visited by the authors between April and October 2022, i.e. within the optimal survey period indicated in the metric guidance. Sites were assessed initially using metric version 3.1 (Panks et al., 2022), which was current at the time of survey, and were subsequently updated to the statutory metric for analysis using field notes and species data.. Following the biodiversity metric guidance, we calculated biodiversity units at the habitat parcel scale, such that polygons with consistent habitat type and condition are the unit of assessment. We assigned habitat type and condition score to all quadrats falling within the parcel. Where the current site conditions (2022) and quadrat data (2010 to 2020) differed from each other in habitat or condition, e.g. the % bracken cover, we deferred to the quadrat data in order to match our response and explanatory variables more fairly. Across all samples, area was set to 1 ha arbitrarily, and strategic significance set to 1 (no strategic significance), to allow comparison between sites. To assign biodiversity units to the bird and butterfly transects, we averaged the biodiversity units of plant quadrats within the transect routes plus a buffer of 500 m (birds) or 100 m (butterflies). Quadrats were positioned to represent the habitats present at each site proportionally, and transect routes were also positioned to represent the habitats present across each site. Although units have been calculated as precisely as possible for all taxa, we recognize that biodiversity units are calculated more precisely for the plant dataset than the bird and butterfly dataset: the size of transect buffer is subjective, and some transects run adjacent to offsite habitat that could not be accessed. Further detail about biodiversity unit calculation can be found in the Supporting Information. Equation 1. Biodiversity unit calculation following the statutory biodiversity metric (Defra, 2023) Size of habitat parcel × Distinctiveness × Condition × Strategic Significance = Biodiversity Units Species response variable calculation We reused species datasets for plants, birds and butterflies recorded by the sites to calculate our response variables (Table S1). Plant species presence data were recorded using 2 x 2m quadrats of all vascular plant species at approximately 50 sample locations per site (mean 48.1, sd 3.7), stratified to represent all habitat types on site. If the quadrat fell within woodland or scrub, trees and shrubs rooted within a 10 x 10 m plot centred on the quadrat were also counted and added to the quadrat species records, with any duplicate species records removed. We treated each quadrat as a sample point, and the most recent census year was analysed (ranging between 2011-2021). Bird data were collected annually using the Breeding Birds Survey method of the British Trust for Ornithology: two approximately parallel 1 km long transects were routed through representative habitat on each site. The five most recent census years were analysed (all fell between 2006-2019), treating each year as a sample point (Bateman et al., 2013). Butterfly data were collected annually using the Pollard Walk method of the UK Butterfly Monitoring Scheme: a fixed transect route taking 30 to 90 minutes to walk (c. 1-2 km) was established through representative habitat on each site. The five most recent census years were analysed (all fell between 2006-2019), treating each year as a sample point. Full detail of how these datasets were originally collected in the field can be found in Supporting Information. For species richness estimates we omitted any records with vague taxon names not resolved to species level. Subspecies records were put back to the species level, as infraspecific taxa were recorded inconsistently across sites. Species synonyms were standardised across all sites prior to analysis. For bird abundance we used the maximum count of individuals recorded per site per year for each species as per the standard approach (Bateman et al., 2013). For butterfly abundance we used sum abundance over 26 weekly visits each year for each species at each site, using a GAM to interpolate missing weekly values (Dennis et al., 2013). Designated taxa were identified using the Great Britain Red List data held by JNCC (2022); species with any Red List designation other than Data Deficient or Least Concern were summed. Plant species range and range change index data followed PLANTATT (Hill et al., 2004). Range was measured as the number of 10x10 km cells across Great Britain that a species is found in. The change index measures the relative magnitude of range size change in standardised residuals, comparing 1930-1960 with 1987-1999. For birds, species mean population size across Great Britain followed Musgrove et al., 2013. We used the breeding season population size estimates to match field surveys. Bird long-term population percentage change (generally 1970-2014) followed Defra (2017). For butterflies, range and change data followed Fox et al., 2015. Range data was occupancy of UK 10 km squares 2010-2014. Change was percent abundance change 1976-2014. For all taxa, mean range and mean change were averaged from all the species present in the sample, not weighted by the species’ abundance in the sample. · Bateman, I. J., Harwood, A. R., Mace, G. M., Watson, R. T., Abson, D. J., Andrews, B., et al. (2013). Bringing ecosystem services into economic decision-making: Land use in the United Kingdom. Science (80-. ). 341, 45–50. doi: 10.1126/science.1234379. · British Trust for Ornithology (BTO), 2022. Breeding Bird methodology and survey design. Available online at https://www.bto.org/our-science/projects/breeding-bird-survey/research-conservation/methodology-and-survey-design · Defra, 2023. Statutory biodiversity metric tools and guides. https://www.gov.uk/government/publications/statutory-biodiversity-metric-tools-and-guides. · Dennis, E. B., Freeman, S. N., Brereton, T., and
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Page by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Page. The dataset can be utilized to understand the population distribution of Page by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Page. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Page.
Key observations
Largest age group (population): Male # 5-9 years (14) | Female # 30-34 years (17). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Page Population by Gender. You can refer the same here
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Refugia can facilitate the persistence of species through long-term environmental change, but it is not clear if Pleistocene refugia will remain functional under anthropogenic climate change. Dieback of species within refugia therefore raises concerns about their long-term persistence. Using repeat field surveys, we investigate dieback patterns of an isolated population of Eucalyptus macrorhyncha during two droughts and discuss prospects for its continued persistence in a Pleistocene refugium. We first confirm that the Clare Valley in South Australia has constituted a long-term refugium for the species, with the population being genetically highly distinct from other conspecific populations. However, the population lost >40% of individuals and biomass through the two droughts, with mortality being just below 20% after the Millennium Drought (2000–2009) and almost 25% after the Big Dry (2017-2019). The best predictors of mortality differed after each drought. While the north-facing aspect of a sampling location was a significant positive predictor after both droughts, biomass density and slope were significant negative predictors only after the Millennium Drought, and distance to the north-west corner of the park, which intercepts hot, dry winds, was significant after the Big Dry only. This suggests that more marginal sites with low biomass and located on exposed, flat plateaus were more vulnerable initially, but that heat-stress was an important driver of dieback during the Big Dry. Therefore, the causative drivers of dieback may change during population decline. Regeneration occurred predominantly on southern and eastern aspects, which would receive the least solar radiation. Occurrence in a refugium did not protect this population from dieback. However, gullies with lower solar radiation are continuing to support relatively healthy, regenerating stands of red stringybark, providing hope for persistence in small pockets. Monitoring and managing these pockets during future droughts will be essential to ensure the persistence of this isolated and genetically unique population. Methods The data contains three datasets derived from analysing data from multiple surveys of a red stringybark population (Eucalyptus macrorhyncha) in Spring Gully Conservation Park (SGCP), Clare Valley, Australia. These are the Tree Health Index (THI), Biomass and Drivers datasets, which are used in the analyses of the associated paper. Below I explain how each dataset was obtained. The South Australian Department of Environment and Water (DEW) initiated a tree health monitoring program in 2009, during which four North-South oriented transects were established in SGCP. Each transect (between 1.2 and 1.8 km long) had sampling sites every 50 m. At each sampling site, the four closest canopy trees within a 10 m radius were marked with a permanent aluminum tag, their location recorded with a handheld GPS (brand and model unknown), and various measurements relating to their health status taken (see below). In total, 471 trees were surveyed, 30 of which were South Australian blue gums (Eucalyptus leucoxylon F.Muell.) and the remainder were red stringbark. Transects were surveyed in January and February 2009, March 2010, November 2011, August 2012, November 2013, and September 2014. Parameters recorded included tree status (dead/ alive; trees with dead stems but with living basal sprouts were scored as alive), crown extent (percentage area of assessable crown with live leaves), and crown density (percentage of skylight blocked by the leafy crown). Percentage values were recorded as eight categories: 0 (0%), 1 (1–10%), 2 (11–20%), 3 (21–40%), 4 (41–60%), 5 (61–80%), 6 (81–90%), and 7 (91–100%). The assessable crown was defined as consisting of all living and dead branches of the crown. In addition, epicormic growth and extent of reproductive activity (presence of flowers and/or fruits) were classified into four categories: 0 (absent, not visible), 1 (scarce, present but not readily visible), 2 (common, clearly visible throughout the assessable crown), 3 (abundant, dominates the appearance of the assessable crown). We calculated a summative index consisting of canopy extent, canopy density, and epicormic growth to indicate tree health – hereafter referred to as the tree health index (THI). Because crown extent and density are considered the most important indicators of tree health, we retained them at their larger scale (0–7, compared to 0–3 for epicormic growth), giving a maximum value of 17 for the THI. Trees that appeared dead at some surveys but that later resprouted (i.e., epicormic growth or basal sprouts), were retrospectively awarded a THI score of 1 (instead of zero). To get an indication of the health status of the red stringybark population, the proportion of dead trees and the average THI of all 441 stringybark trees surveyed repeatedly since 2009 were determined and this data is available in the THI dataset. In September and December 2021, we revisited all trees that had been surveyed and tagged previously. Relocation of trees was achieved with high confidence because of the availability of GPS locations for each tree and because tags remained on, or had fallen directly beneath, at least two of the four trees at each site. Six sites consisting entirely of blue gum were not resurveyed (sites T1S02, T1S03, T1S04, T1S05, T1S30, T3S09). Methods for the resurvey focused on replicating the methods used in the earlier surveys (to facilitate comparisons) and on collecting additional information to provide area-based estimates of dieback. To achieve area-based estimates, we determined a center point for each site so that each of the four trees at a site was in a different quarter (delineated using the four cardinal directions). For sites with one or more blue gum trees among the four surveyed trees, blue gums were replaced by the nearest stringybark in the relevant quarter. In one instance, no nearest stringybark neighbour was present within 10 m and this site was excluded from analyses including biomass density. Additional trees were added to four sites that had less than four surveyed trees. This resulted in a total of 112 sites with 448 trees of red stringybark. This allowed estimating the tree density per hectare at each site by measuring, averaging, and squaring the distance of each tree to the center point. The inverse of this average distance was then multiplied by the value of the desired area (in this case 1 ha) to obtain an estimate of tree density, following the point-centered quarter method. To estimate biomass, we recorded diameter at breast height (DBH) and tree height for each stem of a tree (trees regularly had multiple stems), living or dead. A tree with dead stems was considered alive if there was any epicormic or basal growth present. A stem was considered alive if epicormic growth was present above 1.3m in height. The height (Ht) was estimated to the nearest meter using a 1.5 m range pole that was held up vertically overhead to provide a reference of approximately 3.5 m height. The DBH was measured using a diameter tape 1.3 m above the ground. A wood density (WD) of 795 (± 19) kg.m-3 was assumed for all trees. We used these values to calculated the above-ground biomass (AGB) as: AGB = 0.0673 × (WD × DBH2 × Ht)0.976. AGB was determined for every stem and then aggregated per tree, meaning a single individual could be composed of both living and dead biomass. We multiplied the estimate of number of individuals per hectare by the mean AGB per tree to obtain area-based estimates of biomass, i.e., biomass density. These calculations were done for each site (to obtain estimates of biomass density per site) and for all 112 sites combined (to obtain a parkwide estimate) and this data is available in the biomass dataset. As an estimate of regeneration, the occurrence of seedlings (< 1 m tall, woody growth lacking) and saplings (< 1 m tall, woody growth present) within a 3 m radius of the center point was recorded. Seedling and sampling numbers for each site were combined to provide an indicator of recruitment. In addition, aspect (in degrees rounded to 10° intervals and determined with a compass) and slope (in degrees using a clinometer) were recorded for each site. We calculated ‘northness’ and ‘eastness’ as the cosine and sine of the aspect (in radians), respectively. Where trees within a site were located on different slopes in a valley, the aspect and slope were recorded for each slope and then averaged. Distance to the north-west corner of the park (the area most affected by hot, dry summer winds) was calculated as the planar distance between this location and the sampling locations using the “Near (Analysis)” geoprocessing tool in ArcGIS Pro. We calculated the proportion of dead trees per site in 2011 (Mortality 2011) and 2021 (Mortality 2021) and regeneration as indicators of dieback and persistence (response variables). These variables for the 112 sites of the 2021 survey, but including only trees that were surveyed in 2011 as well (a total of 441 trees) are presented in the Drivers dataset.
A dataset of fashion keywords, including their definitions, synonyms, antonyms, search volume and costs.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
https://choosealicense.com/licenses/odc-by/https://choosealicense.com/licenses/odc-by/
🍷 FineWeb
15 trillion tokens of the finest data the 🌐 web has to offer
What is it?
The 🍷 FineWeb dataset consists of more than 18.5T tokens (originally 15T tokens) of cleaned and deduplicated english web data from CommonCrawl. The data processing pipeline is optimized for LLM performance and ran on the 🏭 datatrove library, our large scale data processing library. 🍷 FineWeb was originally meant to be a fully open replication of 🦅 RefinedWeb, with a release… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceFW/fineweb.
RTB Maps is a cloud-based electronic Atlas. We used ArGIS 10 for Desktop with Spatial Analysis Extension, ArcGIS 10 for Server on-premise, ArcGIS API for Javascript, IIS web services based on .NET, and ArcGIS Online combining data on the cloud with data and applications on our local server to develop an Atlas that brings together many of the map themes related to development of roots, tubers and banana crops. The Atlas is structured to allow our participating scientists to understand the distribution of the crops and observe the spatial distribution of many of the obstacles to production of these crops. The Atlas also includes an application to allow our partners to evaluate the importance of different factors when setting priorities for research and development. The application uses weighted overlay analysis within a multi-criteria decision analysis framework to rate the importance of factors when establishing geographic priorities for research and development.Datasets of crop distribution maps, agroecology maps, biotic and abiotic constraints to crop production, poverty maps and other demographic indicators are used as a key inputs to multi-objective criteria analysis.Further metadata/references can be found here: http://gisweb.ciat.cgiar.org/RTBmaps/DataAvailability_RTBMaps.htmlDISCLAIMER, ACKNOWLEDGMENTS AND PERMISSIONS:This service is provided by Roots, Tubers and Bananas CGIAR Research Program as a public service. Use of this service to retrieve information constitutes your awareness and agreement to the following conditions of use.This online resource displays GIS data and query tools subject to continuous updates and adjustments. The GIS data has been taken from various, mostly public, sources and is supplied in good faith.RTBMaps GIS Data Disclaimer• The data used to show the Base Maps is supplied by ESRI.• The data used to show the photos over the map is supplied by Flickr.• The data used to show the videos over the map is supplied by Youtube.• The population map is supplied to us by CIESIN, Columbia University and CIAT.• The Accessibility map is provided by Global Environment Monitoring Unit - Joint Research Centre of the European Commission. Accessibility maps are made for a specific purpose and they cannot be used as a generic dataset to represent "the accessibility" for a given study area.• Harvested area and yield for banana, cassava, potato, sweet potato and yam for the year 200, is provided by EarthSat (University of Minnesota’s Institute on the Environment-Global Landscapes initiative and McGill University’s Land Use and the Global Environment lab). Dataset from Monfreda C., Ramankutty N., and Foley J.A. 2008.• Agroecology dataset: global edapho-climatic zones for cassava based on mean growing season, temperature, number of dry season months, daily temperature range and seasonality. Dataset from CIAT (Carter et al. 1992)• Demography indicators: Total and Rural Population from Center for International Earth Science Information Network (CIESIN) and CIAT 2004.• The FGGD prevalence of stunting map is a global raster datalayer with a resolution of 5 arc-minutes. The percentage of stunted children under five years old is reported according to the lowest available sub-national administrative units: all pixels within the unit boundaries will have the same value. Data have been compiled by FAO from different sources: Demographic and Health Surveys (DHS), UNICEF MICS, WHO Global Database on Child Growth and Malnutrition, and national surveys. Data provided by FAO – GIS Unit 2007.• Poverty dataset: Global poverty headcount and absolute number of poor. Number of people living on less than $1.25 or $2.00 per day. Dataset from IFPRI and CIATTHE RTBMAPS GROUP MAKES NO WARRANTIES OR GUARANTEES, EITHER EXPRESSED OR IMPLIED AS TO THE COMPLETENESS, ACCURACY, OR CORRECTNESS OF THE DATA PORTRAYED IN THIS PRODUCT NOR ACCEPTS ANY LIABILITY, ARISING FROM ANY INCORRECT, INCOMPLETE OR MISLEADING INFORMATION CONTAINED THEREIN. ALL INFORMATION, DATA AND DATABASES ARE PROVIDED "AS IS" WITH NO WARRANTY, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, FITNESS FOR A PARTICULAR PURPOSE. By accessing this website and/or data contained within the databases, you hereby release the RTB group and CGCenters, its employees, agents, contractors, sponsors and suppliers from any and all responsibility and liability associated with its use. In no event shall the RTB Group or its officers or employees be liable for any damages arising in any way out of the use of the website, or use of the information contained in the databases herein including, but not limited to the RTBMaps online Atlas product.APPLICATION DEVELOPMENT:• Desktop and web development - Ernesto Giron E. (GeoSpatial Consultant) e.giron.e@gmail.com• GIS Analyst - Elizabeth Barona. (Independent Consultant) barona.elizabeth@gmail.comCollaborators:Glenn Hyman, Bernardo Creamer, Jesus David Hoyos, Diana Carolina Giraldo Soroush Parsa, Jagath Shanthalal, Herlin Rodolfo Espinosa, Carlos Navarro, Jorge Cardona and Beatriz Vanessa Herrera at CIAT, Tunrayo Alabi and Joseph Rusike from IITA, Guy Hareau, Reinhard Simon, Henry Juarez, Ulrich Kleinwechter, Greg Forbes, Adam Sparks from CIP, and David Brown and Charles Staver from Bioversity International.Please note these services may be unavailable at times due to maintenance work.Please feel free to contact us with any questions or problems you may be having with RTBMaps.
The Arlington Profile combines countywide data sources and provides a comprehensive outlook of the most current data on population, housing, employment, development, transportation, and community services. These datasets are used to obtain an understanding of community, plan future services/needs, guide policy decisions, and secure grant funding. A PDF Version of the Arlington Profile can be accessed on the Arlington County website.