Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMicrosoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.ResultsHere, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).ConclusionsEscape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
Facebook
TwitterBrain pericytes are one of the critical cell types that regulate endothelial barrier function and activity, thus ensuring adequate blood flow to the brain. The genetic pathways guiding undifferentiated cells into mature pericytes are not well understood. We show here that pericyte precursor populations from both neural crest and head mesoderm of zebrafish express the transcription factor nkx3.1 develop into brain pericytes. We identify the gene signature of these precursors and show that an nkx3.1-, foxf2a-, and cxcl12b-expressing pericyte precursor population is present around the basilar artery prior to artery formation and pericyte recruitment. The precursors later spread throughout the brain and differentiate to express canonical pericyte markers. Cxcl12b-Cxcr4 signaling is required for pericyte attachment and differentiation. Further, both nkx3.1 and cxcl12b are necessary and sufficient in regulating pericyte number as loss inhibits and gain increases pericyte number. Through genetic experiments, we have defined a precursor population for brain pericytes and identified genes critical for their differentiation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The hectares of habitat protected and the number of adults and children fed in one year were calculated for each of the six crop types for Canada and United States. The calculations were based on the 50th centile of the cumulative frequency distributions of change in crop yield due to pesticide treatment for each crop type. An editable interactive table was created using Microsoft Excel that would allow individuals to determine how pesticide treatment in their selected jurisdiction (province in Canada or state in the United States) and crop translates into habitat saved, calories produced, and mouths fed. This table allows the user to choose the country (Canada or United States), whether to include the organic agriculture correction factor, their state or province of interest, crop, and whether a young child, adolescent child, adult women, or adult man is being fed. The table will then calculate the hectares of habitat saved, added number of calories produced (kcal), the number of individual fed in one day, and the number of individual fed in one year. Due to the variability in yield results between crops and studies, the Excel user form allows individuals to set whichever yield increase they anticipate observing or use the 50th centile of yield increase from the cumulative frequency distribution for each crop.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Analyzing Coffee Shop Sales: Excel Insights š
In my first Data Analytics Project, I Discover the secrets of a fictional coffee shop's success with my data-driven analysis. By Analyzing a 5-sheet Excel dataset, I've uncovered valuable sales trends, customer preferences, and insights that can guide future business decisions. šā
DATA CLEANING š§¹
⢠REMOVED DUPLICATES OR IRRELEVANT ENTRIES: Thoroughly eliminated duplicate records and irrelevant data to refine the dataset for analysis.
⢠FIXED STRUCTURAL ERRORS: Rectified any inconsistencies or structural issues within the data to ensure uniformity and accuracy.
⢠CHECKED FOR DATA CONSISTENCY: Verified the integrity and coherence of the dataset by identifying and resolving any inconsistencies or discrepancies.
DATA MANIPULATION š ļø
⢠UTILIZED LOOKUPS: Used Excel's lookup functions for efficient data retrieval and analysis.
⢠IMPLEMENTED INDEX MATCH: Leveraged the Index Match function to perform advanced data searches and matches.
⢠APPLIED SUMIFS FUNCTIONS: Utilized SumIFs to calculate totals based on specified criteria.
⢠CALCULATED PROFITS: Used relevant formulas and techniques to determine profit margins and insights from the data.
PIVOTING THE DATA š
⢠CREATED PIVOT TABLES: Utilized Excel's PivotTable feature to pivot the data for in-depth analysis.
⢠FILTERED DATA: Utilized pivot tables to filter and analyze specific subsets of data, enabling focused insights. Specially used in āPEAK HOURSā and āTOP 3 PRODUCTSā charts.
VISUALIZATION š
⢠KEY INSIGHTS: Unveiled the grand total sales revenue while also analyzing the average bill per person, offering comprehensive insights into the coffee shop's performance and customer spending habits.
⢠SALES TREND ANALYSIS: Used Line chart to compute total sales across various time intervals, revealing valuable insights into evolving sales trends.
⢠PEAK HOUR ANALYSIS: Leveraged Clustered Column chart to identify peak sales hours, shedding light on optimal operating times and potential staffing needs.
⢠TOP 3 PRODUCTS IDENTIFICATION: Utilized Clustered Bar chart to determine the top three coffee types, facilitating strategic decisions regarding inventory management and marketing focus.
*I also used a Timeline to visualize chronological data trends and identify key patterns over specific times.
While it's a significant milestone for me, I recognize that there's always room for growth and improvement. Your feedback and insights are invaluable to me as I continue to refine my skills and tackle future projects. I'm eager to hear your thoughts and suggestions on how I can make my next endeavor even more impactful and insightful.
THANKS TO: WsCube Tech Mo Chen Alex Freberg
TOOLS USED: Microsoft Excel
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.
Key observations
The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a test
Facebook
TwitterRaw data of real analytical use cases in a number of industries and companies is frequently provided in an Excel-based form. These files usually cannot be processed directly in machine learning models, but must first be cleaned and preprocessed. In this procedure, many different types of pitfalls may occur. This makes data preprocessing an essential time factor in the daily work of a data scientist.
Here, an Excel spreadsheet will be presented which in this form is closely oriented to a real case but contains only simulated figures for reasons of data and business results protection. The form and structure of the file correspond to a real case and could be encountered by a data scientist in a company in this way. Such a file can be the result of a download from a financial controlling system, e.g. SAP.
The data includes information about sold goods resp. product units, the associated turnover and hours worked. This information is grouped by month, store and department of the retailer. Moreover, information about the sales area in a specific department as well as about the opening hours of the store is provided.
The following goals of data cleansing might be addressed:
Furthermore, the data can be investigated with regard to correlations between different features and/or a regression model.
GNU General Public License v3.0 - https://www.gnu.org/licenses/gpl-3.0.en.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Six samples were evaluated in unconfined and triaxial compression, their data are included in separate excel spreadsheets, and summarized in the word document. Three samples were plugged along the axis of the core (presumed to be nominally vertical) and three samples were plugged perpendicular to the axis of the core. A designation of "V"indicates vertical or the long axis of the plugged sample is aligned with the axis of the core. Similarly, "H" indicates a sample that is nominally horizontal and cut orthogonal to the axis of the core. Stress-strain curves were made before and after the testing, and are included in the word doc. The confining pressure for this test was 2800 psi. A series of tests are being carried out on to define a failure envelope, to provide representative hydraulic fracture design parameters and for future geomechanical assessments. The samples are from well 52-21, which reaches a maximum depth of 3581 ft +/- 2 ft into a gneiss complex.
Facebook
TwitterThis dataset was created by PB&J
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
NaiveBayes_R.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given recidivism (P(x_ijāR)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|R): This tab contains probabilities of feature attributes among recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. Recidivated_Train: This tab contains re-coded features of recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|R) tab. NaiveBayes_NR.xlsx: This Excel file includes information as to how probabilities of observed features are calculated given non-recidivism (P(x_ijāN)) in the training data. Each cell is embedded with an Excel function to render appropriate figures. P(Xi|N): This tab contains probabilities of feature attributes among non-recidivated offenders. NIJ_Recoded: This tab contains re-coded NIJ recidivism challenge data following our coding schema described in Table 1. NonRecidivated_Train: This tab contains re-coded features of non-recidivated offenders. Tabs from [Gender] through [Condition_Other]: Each tab contains probabilities of feature attributes given non-recidivism. We use these conditional probabilities to replace the raw values of each feature in P(Xi|N) tab. Training_LnTransformed.xlsx: Figures in each cell are log-transformed ratios of probabilities in NaiveBayes_R.xlsx (P(Xi|R)) to the probabilities in NaiveBayes_NR.xlsx (P(Xi|N)). TestData.xlsx: This Excel file includes the following tabs based on the test data: P(Xi|R), P(Xi|N), NIJ_Recoded, and Test_LnTransformed (log-transformed P(Xi|R)/ P(Xi|N)). Training_LnTransformed.dta: We transform Training_LnTransformed.xlsx to Stata data set. We use Stat/Transfer 13 software package to transfer the file format. StataLog.smcl: This file includes the results of the logistic regression analysis. Both estimated intercept and coefficient estimates in this Stata log correspond to the raw weights and standardized weights in Figure 1. Brier Score_Re-Check.xlsx: This Excel file recalculates Brier scores of Relaxed NaĆÆve Bayes Classifier in Table 3, showing evidence that results displayed in Table 3 are correct. *****Full List***** NaiveBayes_R.xlsx NaiveBayes_NR.xlsx Training_LnTransformed.xlsx TestData.xlsx Training_LnTransformed.dta StataLog.smcl Brier Score_Re-Check.xlsx Data for Weka (Training Set): Bayes_2022_NoID Data for Weka (Test Set): BayesTest_2022_NoID Weka output for machine learning models (Conventional naĆÆve Bayes, AdaBoost, Multilayer Perceptron, Logistic Regression, and Random Forest)
Facebook
TwitterThe ITEX experiment at Thingvellir was set up in 1995 when control and OTC plots 1-10 were set up. Sampling of plots was then repeated in 1996, 1998 and 2000. The sampling was limited to recording of species. This dataset is in excel format. For more information, please see the readme file.
Facebook
TwitterThis dataset contains cover community data from the US TOOL2 site, Alaska in 1995. This dataset is in excel format. For more information, please see the readme file.
Facebook
TwitterAhoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, youāll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Facebook
TwitterThis dataset contains the species data from the US TOOL2 site, Alaska in 1995. This dataset is in excel format. For more information, please see the readme file.
Facebook
TwitterThe objective of this project was to develop system designs for programs to monitor travel time reliability and to prepare a guidebook that practitioners and others can use to design, build, operate, and maintain such systems. Generally, such travel time reliability monitoring systems will be built on top of existing traffic monitoring systems. The focus of this project was on travel time reliability. The data from the monitoring systems developed in this project ā from both public and private sources āincluded, wherever cost-effective, information on the seven sources of non-recurring congestion. This data was used to construct performance measures or to perform various types of analyses useful for operations management as well as performance measurement, planning, and programming. The datasets in the attached ZIP file support SHRP 2 reliability project L38B, "Pilot testing of SHRP 2 reliability data and analytical products: Minnesota." This report can be accessed via the following URL: https://rosap.ntl.bts.gov/view/dot/3608 This ZIP file package, which is 22.1 MB in size, contains 6 Microsoft Excel spreadsheet files (XLSX). This file package also contains 3 Comma Separated Value files (CSV). The XLSX and CSV files can be opened using Microsoft Excel 2010 and 2016. The CSV files can be opened using most available text editing programs.
Facebook
TwitterThis dataset contains Normalized Difference Vegetation Index (NDVI) images of the 1999 growing season of the Toolik Lake Field station to document differences in on study site in control and treatment plots. For more information, please see the readme file. NOTE: This dataset contains the data in EXCEL format.
Facebook
TwitterComplete annotations for the tabular data are presented below. Tab Fig 1: (A) The heatmap data of G protein family members in the hippocampal tissue of 6-month-old Wildtype (n = 6) and 5xFAD (n = 6) mice; (B) The heatmap data of G protein family members in the cortical tissue of 6-month-old Wildtype (n = 6) and 5xFAD (n = 6) mice; (C) The data in the overlapping part of the Venn diagram (132 elements); (D) The data information for creating volcano plot; (E) The data information for creating heatmap of GPCR-related DEGs; (F) Expression of Gnb5 in the large sample dataset GSE44772; Control, n = 303; AD, n = 387; (H) Statistical analysis of Gnb5 protein levels from panel G; Wildtype, n = 4; 5xFAD, n = 4; (J) Statistical analysis of Gnb5 protein levels from panel I; Wildtype, n = 4; 5xFAD, n = 4; (L) Quantitative analysis of Gnb5 fluorescence intensity in 5xFAD and Wildtype groups; Wildtype, n = 4; 5xFAD, n = 4. Tab Fig 2: (D) qPCR data of Gnb5 knockout in hippocampal tissue; Gnb5F/F, n = 6; Gnb5-CCKO, n = 6; (EāI, LāN) Animal behavioral tests in mice, Gnb5F/F, n = 22; Gnb5-CCKO, n = 16; (E) Total distance traveled in the open field experiment; (F) Training curve in the Morris water maze (MWM); (F-day6) Data from the sixth day of MWM training; (G) Percentage of time spent by the mouse in the target quadrant in the MWM; (H) Statistical analysis of the number of times the mouse traverses the target quadrant in the MWM; (I) Latency to first reach the target quadrant in the MWM; (L) Baseline freezing percentage of mice in an identical testing context; (M) Percentage of freezing time of mice during the Context phase; (N) Percentage of freezing time of mice during the Cue phase. Tab Fig 3: (DāF, H) MWM tests in mice; Wildtype+AAV-GFP, n = 20; Wildtype+AAV-Gnb5-GFP, n = 23; 5xFAD + AAV-GFP, n = 23; 5xFAD + AAV-Gnb5-GFP, n = 26; (D) Training curve in the MWM; (D-day6) Data from the sixth day of MWM training; (E) Percentage of time spent in the target quadrant in the MWM; (F) Statistical analysis of the number of entries in the target quadrant in the MWM; (H) Movement speed of mice in the MWM; (IāK) The contextual fear conditioning test in mice; 5xFAD + AAV-GFP, n = 23; 5xFAD + AAV-Gnb5-GFP, n = 26; (I) Baseline freezing percentage of mice in an identical testing context; (J) Percentage of freezing time of mice during the Context phase; (K) Percentage of freezing time of mice during the Cue phase; (L) Total distance traveled in the open field test; (M) Percentage of time spent in the center area during the open field test. Tab Fig 4: (B, C) Quantification of Aβ plaques in the hippocampus sections from Wildtype and 5xFAD mice injected with either AAV-Gnb5 or AAV-GFP; Wildtype+AAV-GFP, n = 4; Wildtype+AAV-Gnb5-GFP, n = 4; 5xFAD + AAV-GFP, n = 4; 5xFAD + AAV-Gnb5-GFP, n = 4; (B) Quantification of Aβ plaques number; (C) Quantification of Aβ plaques size; (F, G) Quantification of Aβ pylaques from indicted mice lines; WT&Gnb5F/F&CamKIIa-CreERT+Vehicle, n = 4; 5xFAD&Gnb5F/F&CamKIIa-CreERT+Vehicle, n = 4; 5xFAD&Gnb5F/F&CamKIIa-CreERT+Tamoxifen, n = 4; (F) Quantification of Aβ plaque size; (G) Quantification of Aβ plaque number. Tab Fig 5: (B) Overexpression of Gnb5-AAV in 5xFAD mice affects the expression of proteins related to APP cleavage (BACE1, β-CTF, Nicastrin and APP); Statistical analysis of protein levels; n = 4, respectively; (D) Tamoxifen-induced Gnb5 knockdown in 5xFAD mice affects APP-cleaving proteins; Statistical analysis of protein levels; n = 4, respectively; (F) Gnb5-CCKO mice show altered expression of APP-cleaving proteins; Statistical analysis of protein levels; n = 6, respectively. Tab Fig 7: (C, D) Quantification of Aβ plaques in the overexpressed full-length Gnb5, truncated fragments, and mutant truncated fragment AAV in 5xFAD mice; n = 4, respectively; (C) Quantification of Aβ plaques size; (D) Quantification of Aβ plaques number; (F) Effect of overexpressing full-length Gnb5, truncated fragments, and mutant truncated fragment viruses on the expression of proteins related to APP cleavage process in 5xFAD; Statistical analysis of protein levels; n = 3, respectively. (XLSX)
Facebook
TwitterGovernment Equalities Office spend data July 2012 (Excel format)
Date: Thu Oct 04 14:17:24 BST 2012
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data analysis of the project Multimode Capable Passive BMS
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data outputs 1-18 Raw data output 1. Differentially expressed genes in AML CSCs compared with GTCs as well as in TCGA AML cancer samples compared with normal ones. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 2. Commonly and uniquely differentially expressed genes in AML CSC/GTC microarray and TCGA bulk RNA-seq datasets. This data was generated based on the results of AML microarray and TCGA data analysis. Raw data output 3. Common differentially expressed genes between training and test set samples the microarray dataset. This data was generated based on the results of AML microarray data analysis. Raw data output 4. Detailed information on the samples of the breast cancer microarray dataset (GSE52327) used in this study. Raw data output 5. Differentially expressed genes in breast CSCs compared with GTCs as well as in TCGA BRCA cancer samples compared with normal ones. Raw data output 6. Commonly and uniquely differentially expressed genes in breast cancer CSC/GTC microarray and TCGA BRCA bulk RNA-seq datasets. This data was generated based on the results of breast cancer microarray and TCGA BRCA data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 7. Differential and common co-expression and protein-protein interaction of genes between CSC and GTC samples. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. CSC, and GTC are abbreviations of cancer stem cell, and general tumor cell, respectively. Raw data output 8. Differentially expressed genes between AML dormant and active CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 9. Uniquely expressed genes in dormant or active AML CSCs. This data was generated based on the results of AML scRNA-seq data analysis. Raw data output 10. Intersections between the targeting transcription factors of AML key CSC genes and differentially expressed genes between AML CSCs vs GTCs and between dormant and active AML CSCs or the uniquely expressed genes in either class of CSCs. Raw data output 11. Targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 12. CSC-specific targeting desirableness score of AML key CSC genes and their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 13. The protein-protein interactions between AML key CSC genes with themselves and their targeting transcription factors. This data was generated based on the results of AML microarray and STRING database-based protein-protein interaction data analysis. Raw data output 14. The previously confirmed associations of genes having the highest targeting desirableness and CSC-specific targeting desirableness scores with AML or other cancersā (stem) cells as well as hematopoietic stem cells. These data were generated based on a PubMed database-based literature mining. Raw data output 15. Drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 16. CSC-specific drug score of available drugs and bioactive small molecules targeting AML key CSC genes and/or their targeting transcription factors. These scores were generated based on an in-house scoring function described in the Methods section. Raw data output 17. Candidate drugs for experimental validation. These drugs were selected based on their respective (CSC-specific) drug scores. CSC is the abbreviation of cancer stem cell. Raw data output 18. Detailed information on the samples of the AML microarray dataset GSE30375 used in this study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMicrosoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue.ResultsHere, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/).ConclusionsEscape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.