Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are submitting the updated data and codes for replicating the analysis in the revised manuscript, "Does Average Skewness Matter? Evidence from the Taiwanese Stock Market".
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains raw data files and base codes to analyze them.A. The 'powerx_y.xlsx' files are the data files with the one dimensional trajectory of optically trapped probes modulated by an Ornstein-Uhlenbeck noise of given 'x' amplitude. For the corresponding diffusion amplitude A=0.1X(0.6X10-6)2 m2/s, x is labelled as '1'B. The codes are of three types. The skewness codes are used to calculate the skewness of the trajectory. The error_in_fit codes are used to calculate deviations from arcsine behavior. The sigma_exp codes point to the deviation of the mean from 0.5. All the codes are written three times to look ar T+, Tlast and Tmax.C. More information can be found in the manuscript.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COVID-19 prediction has been essential in the aid of prevention and control of the disease. The motivation of this case study is to develop predictive models for COVID-19 cases and deaths based on a cross-sectional data set with a total of 28,955 observations and 18 variables, which is compiled from 5 data sources from Kaggle. A two-part modeling framework, in which the first part is a logistic classifier and the second part includes machine learning or statistical smoothing methods, is introduced to model the highly skewed distribution of COVID-19 cases and deaths. We also aim to understand what factors are most relevant to COVID-19’s occurrence and fatality. Evaluation criteria such as root mean squared error (RMSE) and mean absolute error (MAE) are used. We find that the two-part XGBoost model perform best with predicting the entire distribution of COVID-19 cases and deaths. The most important factors relevant to either COVID-19 cases or deaths include population and the rate of primary care physicians.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper evaluates the claim that Welch’s t-test (WT) should replace the independent-samples t-test (IT) as the default approach for comparing sample means. Simulations involving unequal and equal variances, skewed distributions, and different sample sizes were performed. For normal distributions, we confirm that the WT maintains the false positive rate close to the nominal level of 0.05 when sample sizes and standard deviations are unequal. However, the WT was found to yield inflated false positive rates under skewed distributions, even with relatively large sample sizes, whereas the IT avoids such inflation. A complementary empirical study based on gender differences in two psychological scales corroborates these findings. Finally, we contend that the null hypothesis of unequal variances together with equal means lacks plausibility, and that empirically, a difference in means typically coincides with differences in variance and skewness. An additional analysis using the Kolmogorov-Smirnov and Anderson-Darling tests demonstrates that examining entire distributions, rather than just their means, can provide a more suitable alternative when facing unequal variances or skewed distributions. Given these results, researchers should remain cautious with software defaults, such as R favoring Welch’s test.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database includes simulated data showing the accuracy of estimated probability distributions of project durations when limited data are available for the project activities. The base project networks are taken from PSPLIB. Then, various stochastic project networks are synthesized by changing the variability and skewness of project activity durations.
Number of variables: 20
Number of cases/rows: 114240
Variable List:
• Experiment ID: The ID of the experiment
• Experiment for network: The ID of the experiment for each of the synthesized networks
• Network ID: ID of the synthesized network
• #Activities: Number of activities in the network, including start and finish activities
• Variability: Variance of the activities in the network (this value can be either high, low, medium or rand, where rand shows a random combination of low, high and medium variance in the network activities.)
• Skewness: Skewness of the activities in the network (Skewness can be either right, left, None or rand, where rand shows a random combination of right, left, and none skewed in the network activities)
• Fitted distribution type: Distribution type used to fit on sampled data
• Sample size: Number of sampled data used for the experiment resembling limited data condition
• Benchmark 10th percentile: 10th percentile of project duration in the benchmark stochastic project network
• Benchmark 50th percentile: 50th project duration in the benchmark stochastic project network
• Benchmark 90th percentile: 90th project duration in the benchmark stochastic project network
• Benchmark mean: Mean project duration in the benchmark stochastic project network
• Benchmark variance: Variance project duration in the benchmark stochastic project network
• Experiment 10th percentile: 10th percentile of project duration distribution for the experiment
• Experiment 50th percentile: 50th percentile of project duration distribution for the experiment
• Experiment 90th percentile: 90th percentile of project duration distribution for the experiment
• Experiment mean: Mean of project duration distribution for the experiment
• Experiment variance: Variance of project duration distribution for the experiment
• K-S: Kolmogorov–Smirnov test comparing benchmark distribution and project duration
• distribution of the experiment
• P_value: the P-value based on the distance calculated in the K-S test
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Species interactions in food webs are usually recognized as dynamic, varying across species, space and time due to biotic and abiotic drivers. Yet food webs also show emergent properties that appear consistent, such as a skewed frequency distribution of interaction strengths (many weak, few strong). Reconciling these two properties requires an understanding of the variation in pairwise interaction strengths and its underlying mechanisms. We estimated stream sculpin feeding rates in three seasons at nine sites in Oregon to examine variation in trophic interaction strengths both across and within predator-prey pairs. Predator and prey densities, prey body mass, and abiotic factors were considered as putative drivers of within-pair variation over space and time. We hypothesized that consistently skewed interaction strength distributions could result if individual interaction strengths show relatively little variation, or alternatively, if interaction strengths vary but shift in ways that conserve their overall frequency distribution. Feeding rate distributions remained consistently and positively skewed across all sites and seasons. The mean coefficient of variation in feeding rates within each of 25 focal species pairs across surveys was less than half the mean coefficient of variation seen across species pairs within a survey. The rank order of feeding rates also remained conserved across streams, seasons and individual surveys. On average, feeding rates on each prey taxon nonetheless varied by a hundredfold, with some feeding rates showing more variation in space and others in time. In general, feeding rates increased with prey density and decreased with high stream flows and low water temperatures, although for nearly half of all species pairs, factors other than prey density explained the most variation. Our findings show that although individual interaction strengths exhibit considerable variation in space and time, they can nonetheless remain relatively consistent, and thus predictable, compared to the even larger variation that occurs across species pairs. These results highlight how the ecological scale of inference can strongly shape conclusions about interaction strength consistency and collectively help reconcile how the skewed nature of interaction strength distributions can persist in highly dynamic food webs.
This is the dataset reflects the recorded times that it took for 72 participants to transcribe an Arabic text, and 78 participants to transcribe an English text, both by paper and by smartphone. (*Note that Participant 48 in the English subgroup was identified as an outlier as times for smartphone entry were over 5 SD away from the mean.) All data points are times (in seconds).
It was hypothesized, based on precursor research, that handwriting would be faster than smartphone entry for participants writing in their second language. This hypothesis was supported by this data. Also, the non-normal distributions of the English subgroups (the second language of the participants) is typical of research based on self-paced actions (in this case, self-paced writing). Both subgroups of the English data were positively skewed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🍷 Alcohol vs Life Expectancy’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/alcohol-vs-life-expectancye on 13 February 2022.
--- Dataset description provided by original source is as follows ---
There is a surprising relationship between alcohol consumption and life expectancy. In fact, the data suggest that life expectancy and alcohol consumption are positively correlated - 1.2 additional years for every 1 liter of alcohol consumed annually. This is, of course, a spurious finding, because the correlation of this relationship is very low - 0.28. This indicates that other factors in those countries where alcohol consumption is comparatively high or low are contributing to differences in life expectancy, and further analysis is warranted.
https://data.world/api/databeats/dataset/alcohol-vs-life-expectancy/file/raw/LifeExpectancy_v_AlcoholConsumption_Plot.jpg" alt="LifeExpectancy_v_AlcoholConsumption_Plot.jpg">
The original drinks.csv file in the UNCC/DSBA-6100 dataset was missing values for The Bahamas, Denmark, and Macedonia for the wine, spirits, and beer attributes, respectively. Drinks_solution.csv shows these values filled in, for which I used the Mean of the rest of the data column.
Other methods were considered and ruled out:
beer_servings
, spirit_servings
, and wine_servings
), and upon reviewing the Bahamas, Denmark, and Macedonia more closely, it is apparent that 0 would be a poor choice for the missing values, as all three countries clearly consume alcohol.Filling missing values with MEAN - In the case of the drinks dataset, this is the best approach. The MEAN averages for the columns happen to be very close to the actual data from where we sourced this exercise. In addition, the MEAN will not skew the data, which the prior approaches would do.
The original drinks.csv dataset also had an empty data column: total_litres_of_pure_alcohol
. This column needed to be calculated in order to do a simple 2D plot and trendline. It would have been possible to instead run a multi-variable regression on the data and therefore skip this step, but this adds an extra layer of complication to understanding the analysis - not to mention the point of the exercise is to go through an example of calculating new attributes (or "feature engineering") using domain knowledge.
The graphic found at the Wikipedia / Standard Drink page shows the following breakdown:
The conversion factor from fl oz to L is 1 fl oz : 0.0295735 L
Therefore, the following formula was used to compute the empty column:
total_litres_of_pure_alcohol
=
(beer_servings * 12 fl oz per serving * 0.05 ABV + spirit_servings * 1.5 fl oz * 0.4 ABV + wine_servings * 5 fl oz * 0.12 ABV) * 0.0295735 liters per fl oz
The lifeexpectancy.csv datafile in the https://data.world/uncc-dsba/dsba-6100-fall-2016 dataset contains life expectancy data for each country. The following query will join this data to the cleaned drinks.csv data file:
# Life Expectancy vs Alcohol Consumption
PREFIX drinks: <http://data.world/databeats/alcohol-vs-life-expectancy/drinks_solution.csv/drinks_solution#>
PREFIX life: <http://data.world/uncc-dsba/dsba-6100-fall-2016/lifeexpectancy.csv/lifeexpectancy#>
PREFIX countries: <http://data.world/databeats/alcohol-vs-life-expectancy/countryTable.csv/countryTable#>
SELECT ?country ?alc ?years
WHERE {
SERVICE <https://query.data.world/sparql/databeats/alcohol-vs-life-expectancy> {
?r1 drinks:total_litres_of_pure_alcohol ?alc .
?r1 drinks:country ?country .
?r2 countries:drinksCountry ?country .
?r2 countries:leCountry ?leCountry .
}
SERVICE <https://query.data.world/sparql/uncc-dsba/dsba-6100-fall-2016> {
?r3 life:CountryDisplay ?leCountry .
?r3 life:GhoCode ?gho_code .
?r3 life:Numeric ?years .
?r3 life:YearCode ?reporting_year .
?r3 life:SexDisplay ?sex .
}
FILTER ( ?gho_code = "WHOSIS_000001" && ?reporting_year = 2013 && ?sex = "Both sexes" )
}
ORDER BY ?country
The resulting joined data can then be saved to local disk and imported into any analysis tool like Excel, Numbers, R, etc. to make a simple scatterplot. A trendline and R^2 should be added to determine the relationship between Alcohol Consumption and Life Expectancy (if any).
https://data.world/api/databeats/dataset/alcohol-vs-life-expectancy/file/raw/LifeExpectancy_v_AlcoholConsumption_Plot.jpg" alt="LifeExpectancy_v_AlcoholConsumption_Plot.jpg">
This dataset was created by Jonathan Ortiz and contains around 200 samples along with Beer Servings, Spirit Servings, technical information and other features such as: - Total Litres Of Pure Alcohol - Wine Servings - and more.
- Analyze Beer Servings in relation to Spirit Servings
- Study the influence of Total Litres Of Pure Alcohol on Wine Servings
- More datasets
If you use this dataset in your research, please credit Jonathan Ortiz
--- Original source retains full ownership of the source dataset ---
These data represent mosquito trap site results in the District of Columbia from 2016 to 2018. Trap locations are considered approximate address and/or the “nearest” street address or block to the stated coordinates in the data. Visit Fight the Bite: Protecting the District of Columbia from Mosquitoes- a collection of the 2016-2018 Arbovirus Surveillance Program conducted annually by DC Health, Health Regulation & Licensing Admin., Animal Services Div.Mosquitoes have the potential to spread harmful diseases. During the annual mosquito season in Washington DC, usually from April – October, DC Health deploys surveillance and mitigation methods to control the mosquito population in the District. DC Health (also known as the D.C. Department of Health or formerly DOH) has been trapping and testing mosquitoes for West Nile virus (WNV) for well over a decade. Starting in 2016, and in response to the Zika outbreak in Latin America and the Caribbean, DC Health substantially increased mosquito monitoring activities across the city. There were a total of 28 sites and 36 traps across the 8 wards. Data was submitted to the Centers for Disease Control MoquitoNet Portal.Note: the 2017 analysis does not include data for October. This is because October of 2017 would have skewed the results far too much based on a few variables that occurred. For example, the number of traps which had failed by the end of the season.Mosquito species in Washington, D.C.:Culex Pipiens, Salinarius and Culex Restuan: spread West Nile VirusAedes aegypti : according to the Centers for Disease Control (CDC), health experts have determined this species to be the most competent vector, capable of transmitting Zika to the human population. To date, none of the Aedes aegypti trapped in Washington, D.C. have been found to carry the Zika virus.Aedes albopictus: capable of spreading Zika to people. However, health experts are still learning whether it is likely to do so as it appears at this time, it is not as competent a vector for transmitting Zika as is the Aedes aegypti. Just because a mosquito can carry the virus does not mean that it will cause disease. So far, none of the Aedes albopictus trapped in Washington, D.C. have been found to carry the Zika virus.Aedes japonicus: normally found in South Florida, is present in D.C. in small numbers. Presently there is no indication that they are competent vectors for spreading Zika to the human population.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A common descriptive statistic in cluster analysis is the $R^2$ that measures the overall proportion of variance explained by the cluster means. This note highlights properties of the $R^2$ for clustering. In particular, we show that generally the $R^2$ can be artificially inflated by linearly transforming the data by ``stretching'' and by projecting. Also, the $R^2$ for clustering will often be a poor measure of clustering quality in high-dimensional settings. We also investigate the $R^2$ for clustering for misspecified models. Several simulation illustrations are provided highlighting weaknesses in the clustering $R^2$, especially in high-dimensional settings. A functional data example is given showing how that $R^2$ for clustering can vary dramatically depending on how the curves are estimated.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
As global climate continues to change, so too will phenology of a wide range of insects. Changes in flight season usually are characterised as shifts to earlier dates or means, with attention less often paid to flight season breadth or whether seasons are now skewed. We amassed flight season data for the insect order Odonata, the dragonflies and damselflies, for Norway over the past century-and-a-half to examine the form of flight season change. By means of Bayesian analyses that incorporated uncertainty relative to annual variability in survey effort, we estimated shifts in flight season mean, breadth, and skew. We focussed on flight season breadth, positing that it will track documented growing season expansion. A specific mechanism explored was shifts in voltinism, the number of generations per year, which tends to increase with warming. We found strong evidence for an increase in flight season breadth but much less for a shift in mean, with any shift of the latter tending toward a later mean. Skew has become rightward for suborder Zygoptera, the damselflies, but not for Anisoptera, the dragonflies, or for the Odonata as a whole. We found weak support for voltinism as a predictor of broader flight season; instead, voltinism acted interactively with use of human-modified habitats, including decrease in shading (e.g., from timber extraction). Other potential mechanisms that link warming with broadening of flight season include protracted emergence and cohort splitting, both of which have been documented in the Odonata. It is likely that warming-induced broadening of flight seasons of these widespread insect predators will have wide-ranging consequences for freshwater ecosystems. Methods Data was extracted from Artsdatabanken, a public database for Norway. Data were cleaned, and useable records served as the basis for analyses.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This dataset captures all traffic stops involving motor vehicles. Time of incident, officer assignment, race/sex of stop subject, and outcome of the stop ("Action taken") are also included in this data. Individual traffic stops may populate multiple data rows to account for multiple outcomes: "interview number" is the unique identifier for every one (1) traffic stop.
Data Creation: Cincinnati Police Department (CPD) officers record all traffic stops involving motor vehicles via Contact Cards. Contact Cards are completed every time a CPD officer stops vehicles or pedestrians. The use of Contact Cards came out of the Collaborative Agreement.
Data Created By: The source of this data is the Cincinnati Police Department.
Refresh Frequency: This data is updated daily.
CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/h48j-wkz6
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.
https://www.imf.org/external/terms.htmhttps://www.imf.org/external/terms.htm
The IMF-adapted ND-GAIN index is an adaptation of the original index, adjusted by IMF staff to replace the Doing Business (DB) Index, used as source data in the original ND-GAIN, because the DB database has been discontinued by the World Bank in 2020 and it is no longer allowed in IMF work. The IMF-adapted ND-GAIN is an interim solution offered by IMF staff until the ND-GAIN compilers will review the methodology and replace the DB index.Sources: ND-GAIN; Findex - The Global Findex Database 2021; Worldwide Governance Indicators; IMF staff calculations. Category: AdaptationData series: IMF-Adapted ND-GAIN IndexIMF-Adapted Readiness scoreReadiness score, GovernanceReadiness score, IMF-Adapted EconomicReadiness score, SocialVulnerability scoreVulnerability score, CapacityVulnerability score, EcosystemsVulnerability score, ExposureVulnerability score, FoodVulnerability score, HabitatVulnerability score, HeathVulnerability score, SensitivityVulnerability score, WaterVulnerability score, InfrastructureMetadata:The IMF-adapted ND-GAIN Country Index uses 75 data sources to form 45 core indicators that reflect the vulnerability and readiness of 192 countries from 2015 to 2021. As the original indicator, a country's IMF-adapted ND-GAIN score is composed of a Readiness score and a Vulnerability score. The Readiness score is measured using three sub-components – Economic, Governance and Social. In the original ND-GAIN database, the Economic score is built on the DB index, while in the IMF-adapted ND-GAIN, the DB Index is replaced with a composite index built using the arithmetic mean of “Borrowed from a financial institution (% age 15+)” from The Global Financial Index database (FINDEX_BFI) and “Government effectiveness” from the Worldwide Governance Indicators database (WGI_GE). The Vulnerability, Social and Governance scores do not contain any DB inputs and, hence, have been sourced from the original ND-GAIN database. Methodology:The procedure for data conversion to index is the same as the original ND-GAIN and follows three steps: Step 1. Select and collect data from the sources (called “raw” data), or compute indicators from underlying data. Some data errors (i.e., tabulation errors coming from the source) are identified and corrected at this stage. If some form of transformation is needed (e.g., expressing the measure in appropriate units, log transformation to better represent the real sensitivity of the measure etc.) it happens also at this stage. Step 2. At times some years of data could be missing for one or more countries; sometimes, all years of data are missing for a country. In the first instance, linear interpolation is adopted to make up for the missing data. In the second instance, the indicator is labeled as "missing" for that country, which means the indicator will not be considered in the averaging process. Step 3. This step can be carried out after of before Step 2 above. Select baseline minimum and maximum values for the raw data. These encompass all or most of the observed range of values across countries, but in some cases the distribution of the observed raw data is highly skewed. In this case, ND-GAIN selects the 90-percentile value if the distribution is right skewed, or 10-percentile value if the distribution is left skewed, as the baseline maximum or minimum. Based on this procedure, the IMF–Adapted ND-GAIN Index is derived as follows: i. Replace the original Economic score with a composite index based on the average of WGI_GE and cubic root of FINDEX_BFI1, as follows:IMF-Adapted Economic = ½ · (WGI_GE) + ½ · (FINDEX_BFI)1/3 (1) The IMF-adapted Readiness and overall IMF-adapted ND-GAIN scores are then derived as: IMF-Adapted ND-GAIN Readiness = 1/3 · ( IMF-Adapted Economic + Governance + Social) IMF-Adapted ND-GAIN = ½·( IMF-Adapted ND-GAIN Readiness+ND-GAIN Vulnerability) ii. In case of missing data for one of the indicators in (1), IMF-Adapted ND-GAIN Economic would be based on the value of the available indicator. In case none of the two indicators is available, the IMF-Adapted Economic score would not be produced but the IMF-Adapted ND-GAIN Readiness would be computed as average of the Governance and Social scores. This approach, that replicates the approach used to derive the original ND-GAIN indexes in case of missing data, ensures that the proposed indicator has the same coverage as the original ND-GAIN database.
1 Given that the FINDEX_BFI data are positively skewed, a cubic root transformation has been implemented to induce symmetry.
The global surface climate values data contain monthly and annual surface climatological averages and met element check values for surface CLIMAT reports. The message contains measurements meteorological parameters; the count of months of actual data, mean, standard deviation, median, and skewness of the record are all given for a specified parameter. The data span a maximum of 30 year period ending in December 1990 and December 2000.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual descriptive price statistics for each calendar year 2005 – 2023 for 11 Local Government Districts in Northern Ireland. The statistics include: • Minimum sale price • Lower quartile sale price • Median sale price • Simple Mean sale price • Upper Quartile sale price • Maximum sale price • Number of verified sales Prices are available where at least 30 sales were recorded in the area within the calendar year which could be included in the regression model i.e. the following sales are excluded: • Non Arms-Length sales • sales of properties where the habitable space are less than 30m2 or greater than 1000m2 • sales less than £20,000. Annual median or simple mean prices should not be used to calculate the property price change over time. The quality (where quality refers to the combination of all characteristics of a residential property, both physical and locational) of the properties that are sold may differ from one time period to another. For example, sales in one quarter could be disproportionately skewed towards low-quality properties, therefore producing a biased estimate of average price. The median and simple mean prices are not ‘standardised’ and so the varying mix of properties sold in each quarter could give a false impression of the actual change in prices. In order to calculate the pure property price change over time it is necessary to compare like with like, and this can only be achieved if the ‘characteristics-mix’ of properties traded is standardised. To calculate pure property change over time please use the standardised prices in the NI House Price Index Detailed Statistics file.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This dataset captures all Cincinnati Police Department Calls for Service. The City of Cincinnati's Computer Aided Dispatch (CAD) system records police incident response activity, which includes all calls for service to emergency operators, 911, alarms, police radio and non-emergency calls. CAD records all dispatch information, which is used by dispatchers, field supervisors, and on-scene officers to determine the priority, severity, and response needs surrounding the incident. Once an officer responds to a call, he/she updates the disposition to reflect findings on-scene.
This dataset includes both proactive and reactive police incident data.
Data Creation: This data is created through the City’s computer-aided dispatch (CAD) system.
Data Created By: The source of this data is the Cincinnati Police Department.
Refresh Frequency: This data is updated daily.
CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/xw7t-5phj
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
As global climate continues to change, so too will phenology of a wide range of insects. Changes in flight season usually are characterised as shifts to earlier dates or means, with attention less often paid to flight season breadth or whether seasons are now skewed. We amassed flight season data for the insect order Odonata, the dragonflies and damselflies, for Norway over the past century-and-a-half to examine the form of flight season change. By means of Bayesian analyses that incorporated uncertainty relative to annual variability in survey effort, we estimated shifts in flight season mean, breadth, and skew. We focussed on flight season breadth, positing that it will track documented growing season expansion. A specific mechanism explored was shifts in voltinism, the number of generations per year, which tends to increase with warming. We found strong evidence for an increase in flight season breadth but much less for a shift in mean, with any shift of the latter tending toward a later mean. Skew has become rightward for suborder Zygoptera, the damselflies, but not for Anisoptera, the dragonflies, or for the Odonata as a whole. We found weak support for voltinism as a predictor of broader flight season; instead, voltinism acted interactively with use of human-modified habitats, including decrease in shading (e.g., from timber extraction). Other potential mechanisms that link warming with broadening of flight season include protracted emergence and cohort splitting, both of which have been documented in the Odonata. It is likely that warming-induced broadening of flight seasons of these widespread insect predators will have wide-ranging consequences for freshwater ecosystems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We are submitting the updated data and codes for replicating the analysis in the revised manuscript, "Does Average Skewness Matter? Evidence from the Taiwanese Stock Market".