Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interactive data visualization has become a staple of modern data presentation. Yet, despite its growing popularity, we still lack a general framework for turning raw data into summary statistics that can be displayed by interactive graphics. This gap may stem from a subtle yet profound issue: while we would often like to treat graphics, statistics, and interaction in our plots as independent, they are in fact deeply connected. This article examines this interdependence in light of two fundamental concepts from category theory: groups and monoids. We argue that the knowledge of these algebraic structures can help us design sensible interactive graphics. Specifically, if we want our graphics to support interactive features which split our data into parts and then combine these parts back together (such as linked selection), then the statistics underlying our plots need to possess certain properties. By grounding our thinking in these algebraic concepts, we may be able to build more flexible and expressive interactive data visualization systems. Supplementary materials for this article are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Slides and abstract from invited talk at International Workshop on Complex Systems and Networks. 5-7 October, 2015, Perth.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Presentation Date: Monday, October 22, 2018. Location: IEEE Viz 2018, Berlin, Germany. Abstract: Astronomers have a long history of visualization. Going back only as far as Galileo, discoveries were made using sketches of celestial objects moving over time. Today, Astronomy inquiries can, and often do, make use of petabytes of data at once. Huge surveys are analyzed statistically to understand tiny fluctuations that hint at the fundamental nature of the Universe, and myriad data sets, from telescopes across the globe and in space are brought together to solve problems ranging from the nature of black holes to the structure of the Milky Way to the origins of planets like Earth. In this talk, I will summarize the state of partnerships between astronomical, physical, and computational approaches to gleaning insight from combinations of scientific and information visualization in Astrophysics. In particular, I will discuss how the “glue” linked-view visualization environment (glueviz.org), developed originally to facilitate high-dimensional data exploration in Astronomy and Medicine, can be extended to many other fields of data-driven inquiry. In addition, I will explain how the current open-source, plug & play, approach to software facilitates the combination of powerful programs and projects such as glue, WorldWide Telescope, ESASky, and the Zooniverse Citizen Science platform. Throughout the talk, I will emphasize the commonalities amongst many fields of science that rely on high-dimensional data. Scientists do not draw distinctions between “scientific visualizations” that show literal spatial dimensions, and “information visualizations” that characterize non-spatial and/or statistical aspects of data. My goal will be to leave Vis attendees with an appreciation of the powerful discovery opportunities offered to scientists when visualization types developed by the InfoVis, SciVis, and VAST communities can be connected in the kind of flexible, linked-view high-dimensional visualization environment offered by glue.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT Even though data visualization is a common analytical tool in numerous disciplines, it has rarely been used in agricultural sciences, particularly in agronomy. In this paper, we discuss a study on employing data visualization to analyze a multiplicative model. This model is often used by agronomists, for example in the so-called yield component analysis. The multiplicative model in agronomy is normally analyzed by statistical or related methods. In practice, unfortunately, usefulness of these methods is limited since they help to answer only a few questions, not allowing for a complex view of the phenomena studied. We believe that data visualization could be used for such complex analysis and presentation of the multiplicative model. To that end, we conducted an expert survey. It showed that visualization methods could indeed be useful for analysis and presentation of the multiplicative model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Thematic maps about social inequalities can engage audiences, add context to policy debates, and change attitudes toward the issues. The field of communication has long compared the relative persuasiveness of this kind of abstract data versus concrete examples about individuals. While studies have compared the effectiveness of presenting both types of information alongside each other, the line between them is sometimes blurred in data visualization, which can incorporate individuals’ stories in innovative ways. One context in which incorporating examples within thematic maps may help is when discussing the social determinants of health because the complex relationship between individual and community is central to how the determinants influence health, and communication on this can be challenging. In this study, we randomly presented the UK public (N = 389) with maps incorporating varying levels of “exemplification” for three different social determinants: public transport, air pollution, and youth service provision. We tested how this affected engagement, credibility, and perceptions about the issues. Between-group analysis found few significant differences and therefore limited persuasive power. However, within-subject analysis indicated that the maps with individual-centered stories may be more persuasive but only among those less confident in their ability to interpret data visualizations. Maps of social inequalities that incorporate stories about individuals may be more engaging and persuasive to audiences less confident with statistics.In data visualization experiments, researchers should consider analyzing both differences between treatment groups and differences within subjects in their responses to different stimuli. Maps of social inequalities that incorporate stories about individuals may be more engaging and persuasive to audiences less confident with statistics. In data visualization experiments, researchers should consider analyzing both differences between treatment groups and differences within subjects in their responses to different stimuli.
Facebook
TwitterThis is a fix-release for some broken links in the README. Thanks to @HenningTimm for the community-driven support. ------ Dorothea Strecker, Sama Majidian, Lukas C. Bossert, Évariste Demandt This repository contains materials used during the workshop "Visualization of networks – analyzing and visualizing connections between (planned) NFDI consortia" at the NFDI4Ing Community Meeting (NFDI4Ing Konferenz) 2021 on September 28. During the workshop, the network of (planned) NFDI consortia was visualized and analyzed using the statistical software R and the library igraph. Abstract: Currently, Germany's National Research Data Infrastructure spans a network of nine funded consortia from the first round and ten from the second round. This workshop enables you to visually display and analyze the network of consortia in your internet browser via a remote Jupyter Notebook. The workshop follows the tradition of literate programming. No prior experience in programming and no locally installed software needed – let's weave and tangle ! Slides The presentation slides for the workshop are stored in the file "NFDI4Ing_Community_Meeting_2021.pdf". JupyerNotebook for visualization of networks with R. In the interactive part of the workshop we worked with JupyterNotebooks. The documented sample solution is stored in various formats in the folder Notebook. Direct exports from JupyterNotebook are provided in the following formats: JupyterNotebook (R) PDF (via LuaLaTeX) org-mode Markdown Rscript Webpage WebSlides This repository is licensed under the MIT License.
Facebook
TwitterThe typical result of a microarray experiment is a list of tens or hundreds of genes found to be differentially regulated in the condition under study. Independently of the methods used to select these genes, the common task faced by any researcher is to translate these lists of genes into a better understanding of the biological phenomena involved. Currently, this is done through a tedious combination of searches through the literature and a number of public databases. We developed Onto-Express (OE) as a novel tool able to automatically translate such lists of differentially regulated genes into functional profiles characterizing the impact of the condition studied. OE constructs functional profiles (using Gene Ontology terms) for the following categories: biochemical function, biological process, cellular role, cellular component, molecular function and chromosome location. Statistical significance values are calculated for each category. We demonstrated the validity and the utility of this comprehensive global analysis of gene function by analyzing two breast cancer data sets from two separate laboratories. OE was able to identify correctly all biological processes postulated by the original authors, as well as discover novel relevant mechanisms (Draghici et.al, Genomics, 81(2), 2003). Other results obtained with Onto-Express can be found in Khatri et.al., Genomics. 79(2), 2002. Custom level of abstraction of the Gene Ontology. User account required. Platform: Online tool
Facebook
Twitterhttps://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Supplementary material for the paper entitled "One-step ahead forecasting of geophysical processes within a purely statistical framework"Abstract: The simplest way to forecast geophysical processes, an engineering problem with a widely recognised challenging character, is the so called “univariate time series forecasting” that can be implemented using stochastic or machine learning regression models within a purely statistical framework. Regression models are in general fast-implemented, in contrast to the computationally intensive Global Circulation Models, which constitute the most frequently used alternative for precipitation and temperature forecasting. For their simplicity and easy applicability, the former have been proposed as benchmarks for the latter by forecasting scientists. Herein, we assess the one-step ahead forecasting performance of 20 univariate time series forecasting methods, when applied to a large number of geophysical and simulated time series of 91 values. We use two real-world annual datasets, a dataset composed by 112 time series of precipitation and another composed by 185 time series of temperature, as well as their respective standardized datasets, to conduct several real-world experiments. We further conduct large-scale experiments using 12 simulated datasets. These datasets contain 24 000 time series in total, which are simulated using stochastic models from the families of Autoregressive Moving Average and Autoregressive Fractionally Integrated Moving Average. We use the first 50, 60, 70, 80 and 90 data points for model-fitting and model-validation and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The total number of forecasts produced herein is 2 177 520, among which 47 520 are obtained using the real-world datasets. The assessment is based on eight error metrics and accuracy statistics. The simulation experiments reveal the most and least accurate methods for long-term forecasting applications, also suggesting that the simple methods may be competitive in specific cases. Regarding the results of the real-world experiments using the original (standardized) time series, the minimum and maximum medians of the absolute errors are found to be 68 mm (0.55) and 189 mm (1.42) respectively for precipitation, and 0.23 °C (0.33) and 1.10 °C (1.46) respectively for temperature. Since there is an absence of relevant information in the literature, the numerical results obtained using the standardised real-world datasets could be used as rough benchmarks for the one-step ahead predictability of annual precipitation and temperature.
Facebook
TwitterIntroduction
The Annual Survey of Industries (ASI) is the principal source of industrial statistics in India. It provides statistical information to assess and evaluate, objectively and realistically, the changes in the growth, composition and structure of organized manufacturing sector comprising activities related to manufacturing processes, repair services, gas and water supply and cold storage. The survey has so far been conducted annually under the statutory provisions of the Collection of Statistics (COS) Act, 1953 and the rules framed there-under in 1959 except in the State of Jammu & Kashmir where it is conducted under the J&K Collection of Statistics Act, 1961 and rules framed there under in 1964. From ASI 2010-11 onwards, the survey is to be conducted annually under the statutory provisions of the Collection of Statistics (COS) Act, 2008 and the rules framed there-under in 2011except in the State of Jammu & Kashmir where it is to be conducted under the J&K Collection of Statistics Act, 1961 and rules framed there under in 1964.
The ASI extends its coverage to the entire country upto state level.
The primary unit of enumeration in the survey is a factory in the case of manufacturing industries, a workshop in the case of repair services, an undertaking or a licensee in the case of electricity, gas & water supply undertakings and an establishment in the case of bidi & cigar industries. The owner of two or more establishments located in the same State and pertaining to the same industry group and belonging to same scheme (census or sample) is, however, permitted to furnish a single consolidated return. Such consolidated returns are common feature in the case of bidi and cigar establishments, electricity and certain public sector undertakings.
The survey cover factories registered under the Factory Act 1948.
Sample survey data [ssd]
The sampling design adopted in ASI has undergone considerable changes from time to time, taking into account the technical and other requirements. The present sampling design has been adopted from ASI 2007-08. All the factories in the updated frame are divided into two sectors, viz., Census and Sample.
For ASI 2007-2008, the Census Sector has been defined as follows:
a) All industrial units belonging to the five less industrially developed states/ UT's viz. Manipur, Meghalaya, Nagaland, Tripura and Andaman & Nicobar Islands.
b) For the rest of the twenty-six states/ UT's., (i) units having 100 or more workers, and (ii) all factories covered under Joint Returns.
c) After excluding the Census Sector units as defined above, all units belonging to the strata (State by 4-digit of NIC-08) having less than or equal to 4 units are also considered as Census Sector units. Sample Sector: From the remaining units excluding those of Census Sector, called the sample sector, samples are drawn circular systematically considering sampling fraction of 20% within each stratum (State X Sector X 4-digit NIC) for all the states. An even number of units with a minimum of 4 are selected and evenly distributed in two sub-samples. The sectors considered here are Biri, Manufacturing and Electricity.
Selection of State Samples: After selecting the central sample in the way mentioned above, the remaining units in the sample sector are treated as residual frame for selection of sample units for the States/UTs. Note that for the purpose of selecting samples from the residual frame for the State/UTs, stratification is done afresh by grouping units belonging to District X 3- digit NIC for each state to form strata. The sample units are then drawn circular systematically from each stratum. The basic purpose of introducing the residual sample was to increase the sample size for the sample sector of the states so as to get more reliable estimates at district level. Validated state-wise unit-level data of the central sample are also sent to the states for pooling this data with their surveyed data to get a combined estimate at the sub-state level.
The sampling design adopted in ASI has undergone considerable changes from time to time, taking into account the technical and other requirements. The present sampling design has been adopted from ASI 2007-08. All the factories in the updated frame are divided into two sectors, viz., Census and Sample.
Statutory return submitted by factories as well as Face to Face
Annual Survey of Industries Questionnaire is divided into different blocks:
BLOCK A.IDENTIFICATION BLOCK - This block has been designed to collect the descriptive identification of the sample enterprise. The items are mostly self-explanatory.
BLOCK B. TO BE FILLED BY OWNER OF THE FACTORY - This block has been designed to collect the particulars of the sample enterprise. This point onwards, all the facts and figures in this return are to be filled in by owner of the factory.
BLOCK C: FIXED ASSETS - Fixed assets are of a permanent nature having a productive life of more than one year, which is meant for earning revenue directly or indirectly and not for the purpose of sale in ordinary course of business. They include assets used for production, transportation, living or recreational facilities, hospital, school, etc. Intangible fixed assets like goodwill, preliminary expenses including drawing and design etc are excluded for the purpose of ASI. The fixed assets have, at the start of their functions, a definite value, which decreases with wear and tear. The original cost less depreciation indicates that part of value of fixed assets, which has not yet been transferred to the output. This value is called the residual value. The value of a fixed asset, which has completed its theoretical working life should always be recorded as Re.1/-. The revalued value is considered now. But depreciation will be taken on original cost and not on revalued cost.
BLOCK D: WORKING CAPITAL & LOANS - Working capital represents the excess of total current assets over total current liabilities.
BLOCK E : EMPLOYMENT AND LABOUR COST - Particulars in this block should relate to all persons who work in and for the establishment including working proprietors and active business partners and unpaid family workers. However, Directors of incorporated enterprises who are paid solely for their attendance at meeting of the Board of Directors are to be excluded.
BLOCK F : OTHER EXPENSES - This block includes the cost of other inputs as both the industrial and nonindustrial service rendered by others, which are paid by the factory and most of which are reflected in the ex-factory value of its production during the accounting year.
BLOCK G : OTHER INCOMES - In this block, information on other output/receipts is to be reported.
BLOCK H: INPUT ITEMS (indigenous items consumed) - This block covers all those goods (raw materials, components, chemicals, packing material, etc.), which entered into the production process of the factory during the accounting year. Any material used in the production of fixed assets (including construction work) for the factory's own use should also be included. All intermediate products consumed during the year are to be excluded. Intermediate products are those, which are produced by the factory but are, subjected to further manufacture. For example, in a cotton textile mill, yarn is produced from raw cotton and the same yarn is again used for manufacture of cloth. An intermediate product may also be a final product in the same factory. For example, if the yarn produced by the factory is sold as yarn, it becomes a final product and not an intermediate product. If however, a part of the yarn produced by a factory is consumed by it for manufacture of cloth, that part of the yarn so used will be an intermediate product.
BLOCK I: INPUT ITEMS – directly imported items only (consumed) - Information in this block is to be reported for all imported items consumed. The items are to be imported by the factory directly or otherwise. The instructions for filling up of this block are same as those for Block H. All imported goods irrespective of whether they are imported directly by the unit or not, should be recorded in Block I. Moreover, any imported item, irrespective of whether it is a basic item for manufacturing or not, should be recorded in Block I. Hence 'consumable stores' or 'packing items', if imported, should be recorded in Block I and not in Block H.
BLOCK J: PRODUCTS AND BY-PRODUCTS (manufactured by the unit) - In this block information like quantity manufactured, quantity sold, gross sale value, excise duty, sales tax paid and other distributive expenses, per unit net sale value and ex-factory value of output will be furnished by the factory item by item. If the distributive expenses are not available product-wise, the details may be given on the basis of reasonable estimation.
Data submitted by the factories undergo manual scrutiny at different stages.
1) They are verified by field staff of NSSO from factory records.
2) Verified returns are manually scrutinized by senior level staff before sending to data processing centre.
3) At the data processing centre these are scrutinized before data entry.
4) The entered data are subjected to computer editing and corrections.
5) Tabulated data are checked for anomalies and consistency with previous results.
Relative Standard Error (RSE) is calculated in terms of worker, wages to worker
Facebook
TwitterIn an effort to make adequate and reliable data and information available for scientific policy formulation, planning and implementation of various programs and projects in education in Ghana, the Ministry of Education (MoE) launched the Education Management Information System (EMIS) Project in January, 1997 with technical support during the first and second Phases from the Harvard University and funds from the World Bank and the Government of Ghana. As an integral part of the Free, Compulsory and Universal Basic Education (FCUBE) Program, the EMIS Project was planned to build on the already existing EMIS established in 1988 in the Ministry as part of the Education Reforms. Currently, technical support is being given by UNESCO Institute for Statistics (UIS). Through the EMIS, a strong database has been established within the Ministry of Education. Twenty four basic school censuses have so far been conducted since 1988 and the reports on them are available in the Ministry. This is the Twelfth senior high school census in recent times. This report is presented to provide and upgrade basic data and planning parameters on enrolment, teaching staff, school facilities and examination results. This year's information on Senior High Schools has been produced at national and regional levels. The survey is presented in four sections as follows.
National level Region District
Questionnaire administered to all basic schools in Ghana to collect data on type of school (Public or Private), Location, locality type (Rural or Urban), details of teachers, textbooks, enrolment, facilities, rooms condition etc.
Basic schools level
Census/enumeration data [cen]
14,800 Basic Schools were selected for the census of Ghana Annual Schools
Other [oth]
The questionnaire consists of the following;
100% coverage for public schools 85% coverage for private schools
No sampling error
No other forms of appraisal reported.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interactive data visualization has become a staple of modern data presentation. Yet, despite its growing popularity, we still lack a general framework for turning raw data into summary statistics that can be displayed by interactive graphics. This gap may stem from a subtle yet profound issue: while we would often like to treat graphics, statistics, and interaction in our plots as independent, they are in fact deeply connected. This article examines this interdependence in light of two fundamental concepts from category theory: groups and monoids. We argue that the knowledge of these algebraic structures can help us design sensible interactive graphics. Specifically, if we want our graphics to support interactive features which split our data into parts and then combine these parts back together (such as linked selection), then the statistics underlying our plots need to possess certain properties. By grounding our thinking in these algebraic concepts, we may be able to build more flexible and expressive interactive data visualization systems. Supplementary materials for this article are available online.