Facebook
TwitterOracle Tables To Provide Boat and Shore Data which contains the object of this system is to provide an inventory of vessels that answer two fundamental questions: How many vessels are fishing commercially? What are the characteristics of these vessels? The vessel information (i.e., length, age, horsepower, etc.) is significant to identify accurately the universe of vessels to facilitate scientific assessments of annual fishing effort.The vessel information is useful for designing a statistically robust data collection program to canvass or randomly sample the activities of fishing vessels.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
These two datasets provide the responses to a survey on food including what influences decisions on what people choose to eat, and what is important to people when selecting food for example price, animal welfare, origin of food. Knowledge of the food system Use of technology when purchasing food and key concerns about food. The total sample includes all age groups 16+ and has a sample size of 2475. The Gen Z sample is of generation Z only 16- 25 year olds and has a sample size of 619.
Facebook
TwitterIn order to practice writing SQL queries in a semi-realistic database, I discovered and imported Microsoft's AdventureWorks sample database into Microsoft SQL Server Express. The Adventure Works [fictious] company represents a bicycle manufacturer that sells bicycles and accessories to global markets. Queries were written for developing and testing a Tableau dashboard.
The dataset presented here represents a fraction of the entire manufacturing relational database. Tables within the dataset include product, purchasing, work order, and transaction data.
The full database sample can be found on Microsoft SQL Docs website: https://learn.microsoft.com/en-us/sql/samples/ and additionally on Github: https://github.com/microsoft/sql-server-samples
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
CVS_PSUDATA_TBL:
The Primary Sample Unit data table contains mainly map location, political/administrative boundary and establishment data. This is the highest order table for CVS data (level 1). It stores a single row of information for each unique combination of project identifier (ProjectID) and PSU number (PSUNr). The subordinate data tables may be logically linked to this table using these two columns.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A mixed sampling methodology was implemented (Figure 1) to collect journals and articles. First, a selection filter was applied within the Institute for Scientific Information (ISI) Journal Citation Report (https://jcr.clarivate.com) database to generate a list of 504 life science journals. Then, exclusion criteria were applied to the journal list and 245 periodicals were removed. Filters and exclusion criteria are given in Table 1. Using a pseudo-random sequence of 20 numbers between 1 and 259 generated using GraphPad QuickCalc (https://www.graphpad.com/quickcalcs/randMenu), a final shortlist of 20 journals among the 259 preselected ordered by decreasing 2018 Impact Factor were selected (the latest available impact factor at the time of designing this study). Four additional journals were finally excluded either because they were eventually found to be too clinical or because there was no online access granted to the author’s institution, leading to a final list of 16 periodicals (Table 3). Clinical journals were excluded although they may include publications with some preclinical experiments. This was justified to prevent the possible bias created by both the presumed small proportion of such articles in clinical periodicals which would have prompted a larger sampling and the supposed compliance of these studies with clinical guidelines whose standards may be different 29,30. Fifteen articles per journal were collected by sampling the online contents of each journal, starting from the last issue released in 2019 and browsing backward. This time window was selected to avoid the abundant literature on Coronavirus disease 2019 (Covid-19) published since January 2020, which might show unusual statistical standards. Article inclusion and exclusion criteria are presented in Table 2. Studies using human data were acceptable when they used ex-vivo/in-vitro approaches for extracting tissues, cells or samples. From this intermediate list of 240 articles, 17 were finally excluded during the analysis due to previously unnoticed violations of inclusion criteria or for congruity with exclusion criteria, resulting in a final sample set that included 223 articles. Assessment of reportingEach article was explored, and three types of statistical attributes were quantified (Table 4). Indicators of the transparency of study protocols were binary items coded as 0 (presence of all needed information in the text) or 1 (absence of information in the text for at least one figure or table) and were aggregated as proportions of articles that had an insufficiency (non-disclosure) for the given item. The indicators were chosen as the minimum set of information needed by a reader to replicate the statistical protocol: precise sample size (experimental units), well identified test, software and no contradiction. A contradictory information is defined as a mismatch between information provided in different parts of the manuscript although they refer to the same object, such as the disclosure of dissimilar statistical tests (in methods and figure legends) to describe the analysis in one figure or the disclosure of multiple sample sizes for one single set of data. The article structure was assessed using quantitative items, specified as total counts of given items as well as one binary outcome (presence of a statistical paragraph). Qualitative items represented the article content and have been summarised as an inventory of information of interest. In the sampled articles, supplemental methods and information were considered full-fledged methodological information, but supplementary figures and tables presenting results were not eligible for the quantification of statistical insufficiencies, even if they were used to report location tests.
Facebook
TwitterThese detailed tables show standard errors for sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH). Standard errors for samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, geographic area, pregnancy status, college enrollment status, and probation/parole status.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The Simulacrum (https://simulacrum.healthdatainsight.org.uk) is a synthetic dataset populated entirely with simulated cancer patients. In order to enhance public understanding of the methodology used to create the simulacrum, a sample of the underlying data tables was released, along with an explanatory document (see additional resources). Data within these tables shows incidences of simulation of underlying cause of death code (coded in ICD 10), the stage of the tumour (coded in UICC TNM) and the dose of chemotherapy prescribed to the patient (doses are recorded in mg or other applicable units depending on the drug), for some patients whose first tumours were breast and lung. No specific disclosure controls are applied, as all counts are at all England level for three years of grouped data 2013-2015, and the denominator populations are large enough to pass the anonymisation standard for Health and Social Care data.
Facebook
Twitteranalyze the current population survey (cps) annual social and economic supplement (asec) with r the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics ( bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census - about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups b y state - consider pooling multiple years. county-level is a no-no. despite the american community survey's larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance - and can be trended back to harry truman's presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be t reated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population. the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber's sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it's a bit of a proc freak show. this new github repository contains three scripts: 2005-2012 asec - download all microdata.R down load the fixed-width file containing household, family, and person records import by separating this file into three tables, then merge 'em together at the person-level download the fixed-width file containing the person-level replicate weights merge the rectangular person-level file with the replicate weights, then store it in a sql database create a new variable - one - in the data table 2012 asec - analysis examples.R connect to the sql database created by the 'download all microdata' progr am create the complex sample survey object, using the replicate weights perform a boatload of analysis examples replicate census estimates - 2011.R connect to the sql database created by the 'download all microdata' program create the complex sample survey object, using the replicate weights match the sas output shown in the png file below 2011 asec replicate weight sas output.png statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document. click here to view these three scripts for more detail about the current population survey - annual social and economic supplement (cps-asec), visit: the census bureau's current population survey page the bureau of labor statistics' current population survey page the current population survey's wikipedia article notes: interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current populat ion survey to talk about america, subract a year from the data file name. as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research. confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we've invented the butane lighter? time to transition to r. :D
Facebook
TwitterThe annual Retail store data CD-ROM is an easy-to-use tool for quickly discovering retail trade patterns and trends. The current product presents results from the 1999 and 2000 Annual Retail Store and Annual Retail Chain surveys. This product contains numerous cross-classified data tables using the North American Industry Classification System (NAICS). The data tables provide access to a wide range of financial variables, such as revenues, expenses, inventory, sales per square footage (chain stores only) and the number of stores. Most data tables contain detailed information on industry (as low as 5-digit NAICS codes), geography (Canada, provinces and territories) and store type (chains, independents, franchises). The electronic product also contains survey metadata, questionnaires, information on industry codes and definitions, and the list of retail chain store respondents.
Facebook
TwitterThis is the sample database from sqlservertutorial.net. This is a great dataset for learning SQL and practicing querying relational databases.
Database Diagram:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4146319%2Fc5838eb006bab3938ad94de02f58c6c1%2FSQL-Server-Sample-Database.png?generation=1692609884383007&alt=media" alt="">
The sample database is copyrighted and cannot be used for commercial purposes. For example, it cannot be used for the following but is not limited to the purposes: - Selling - Including in paid courses
Facebook
TwitterThese detailed tables show sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH). Samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, geographic area, pregnancy status, college enrollment status, and probation/parole status.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A description of each tab of the file is below-Stats- Each sample and the aggregated data from all other tabs. At the bottom are statistical values for each comparison.Gencode45- The transcripts per million matrix from Salmon alignment for each sample and every transcript.Sig Transcripts- The list of all transcripts that had statistical significance in each of the four comparisons (A-D). STRING-based- STRING-based gene ontology enrichment terms for the sig transcripts in the four comparisons.CIBERSORTx- Normalized abundance values for each cell type deconvolution of each sample.KRAKEN2_level- The number of reads in each sample that align uniquely to each species annotation. In brackets on the number of paired-end reads that align, such that a 1 only one end of alignment returned reads and a 2 both ends had alignments to the species.MIXCR- The unique amino acid sequences of each chain and the times the sequence is detected in each sample.Heatmap- The raw values used to generate the heatmap within the paper.
Facebook
TwitterThis dataset tracks the updates made on the dataset "Sample Size and Population Estimates Tables (Standard Errors and P Values) - 8.1 to 8.13" as a repository for previous versions of the data and metadata.
Facebook
TwitterThese detailed tables show standard errors for sample sizes and population estimates from the 2011 National Survey on Drug Use and Health (NSDUH). Standard errors for samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, geographic area, pregnancy status, college enrollment status, and probation/parole status.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Facebook
TwitterThese detailed tables show sample sizes and population estimates pertaining to mental health from the 2010 National Survey on Drug Use and Health (NSDUH). Samples sizes and population estimates are provided by age group, gender, race/ethnicity, education level, employment status, poverty level, geographic area, insurance status.
Facebook
TwitterThese detailed tables show sample sizes and population estimates from the 2012 National Survey on Drug Use and Health (NSDUH) Mental Health Detailed Tables. Samples sizes and population estimates are provided age group, gender, race/ethnicity, education level, employment status, county type, poverty level, insurance status, overal health, and geographic area.
Facebook
TwitterThe intention is to collect data for the calendar year 2009 (or the nearest year for which each business keeps its accounts. The survey is considered a one-off survey, although for accurate NAs, such a survey should be conducted at least every five years to enable regular updating of the ratios, etc., needed to adjust the ongoing indicator data (mainly VAGST) to NA concepts. The questionnaire will be drafted by FSD, largely following the previous BAS, updated to current accounting terminology where necessary. The questionnaire will be pilot tested, using some accountants who are likely to complete a number of the forms on behalf of their business clients, and a small sample of businesses. Consultations will also include Ministry of Finance, Ministry of Commerce, Industry and Labour, Central Bank of Samoa (CBS), Samoa Tourism Authority, Chamber of Commerce, and other business associations (hotels, retail, etc.).
The questionnaire will collect a number of items of information about the business ownership, locations at which it operates and each establishment for which detailed data can be provided (in the case of complex businesses), contact information, and other general information needed to clearly identify each unique business. The main body of the questionnaire will collect data on income and expenses, to enable value added to be derived accurately. The questionnaire will also collect data on capital formation, and will contain supplementary pages for relevant industries to collect volume of production data for selected commodities and to collect information to enable an estimate of value added generated by key tourism activities.
The principal user of the data will be FSD which will incorporate the survey data into benchmarks for the NA, mainly on the current published production measure of GDP. The information on capital formation and other relevant data will also be incorporated into the experimental estimates of expenditure on GDP. The supplementary data on volumes of production will be used by FSD to redevelop the industrial production index which has recently been transferred under the SBS from the CBS. The general information about the business ownership, etc., will be used to update the Business Register.
Outputs will be produced in a number of formats, including a printed report containing descriptive information of the survey design, data tables, and analysis of the results. The report will also be made available on the SBS website in “.pdf” format, and the tables will be available on the SBS website in excel tables. Data by region may also be produced, although at a higher level of aggregation than the national data. All data will be fully confidentialised, to protect the anonymity of all respondents. Consideration may also be made to provide, for selected analytical users, confidentialised unit record files (CURFs).
A high level of accuracy is needed because the principal purpose of the survey is to develop revised benchmarks for the NA. The initial plan was that the survey will be conducted as a stratified sample survey, with full enumeration of large establishments and a sample of the remainder.
v01: This is the first version of the documentation. Basic raw data, obtained from data entry.
The scope of the 2009 BAS is all employing businesses in the private sector other than those involved in agricultural activities.
Included are:
· Non-governmental organizations (NGOs, not-for profit organizations, etc.);
· Government Public Bodies
Excluded are:
· Non-employing units (e.g., market sellers);
· Government ministries, constitutional offices and those public bodies involved in public administration and included in the Central Government Budget Sector;
· Agricultural units (unless large scale/commercial - if the Agriculture census only covers household activities);
· “Non-resident” bodies such as international agencies, diplomatic missions (e.g., high commissions and embassies, UNDP, FAO, WHO);
The survey coverage is of all businesses in scope as defined above. Statistical units relevant to the survey are the enterprise and the establishment. The enterprise is an institutional unit and generally corresponds to legal entities such as a company, cooperative, partnership or sole proprietorship. The establishment is an institutional unit or part of an institutional unit, which engages in one, or predominantly one, type of economic activity. Sufficient data must be available to derive or meaningfully estimate value added in order to recognize an establishment. The main statistical unit from which data will be collected in the survey is the establishment. For most businesses there will be a one-to-one relationship between the enterprise and the establishment, i.e., simple enterprises will comprise only one establishment. The purpose of collecting data from establishments (rather than from enterprises) is to enable the most accurate industry estimates of value added possible.
Facebook
TwitterThis dataset tracks the updates made on the dataset "Sample Size and Population Estimates Tables (Prevalence Estimates) - 8.1 to 8.13" as a repository for previous versions of the data and metadata.
Facebook
TwitterCitibike is a well-known bike sharing/renting system in New York City. It is privately owned by Lyft and operates in the boroughs of the Bronx, Brooklyn, Manhattan, Queens, Jersey City New Jersey, and Hoboken New Jersey. The system was proposed in 2008 but officially started in 2013 with 332 stations and 6,000 bikes. As of today, there are 17,000 bikes with approximately 50,000 rides on average daily.
The official Citibike data is available on their website (https://citibikenyc.com/system-data) but they are huge and most of the data is divided into months. Thankfully, Google BigQuery has a united dataset for citibike under bigquery-public-data and it is freely available. But in this dataset, trips table has whopping 5 million rows of data! To get insights and practice data analysis, we do not need that much most of the time. So I queried the dataset with a few cleaning involved and randomly sampled 0.01 of the original data and exported it. Therefore, I acquired a sampled data table with about 470,000 rows and it is correctly representative of the 2013-2017 period. There are some citibike data on Kaggle right now but they do not have the time span as in this data. They are either big tables for about a few months or the months are separately uploaded so somewhat tedious to work with.
Facebook
TwitterOracle Tables To Provide Boat and Shore Data which contains the object of this system is to provide an inventory of vessels that answer two fundamental questions: How many vessels are fishing commercially? What are the characteristics of these vessels? The vessel information (i.e., length, age, horsepower, etc.) is significant to identify accurately the universe of vessels to facilitate scientific assessments of annual fishing effort.The vessel information is useful for designing a statistically robust data collection program to canvass or randomly sample the activities of fishing vessels.