66 datasets found

c
Sidewalk to Street "Walkability" Ratio
s.cnmilf.com
gimi9.com
+1more
Updated Jan 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Western Pennsylvania Regional Data Center (2023). Sidewalk to Street "Walkability" Ratio [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sidewalk-to-street-walkability-ratio
Explore at:
Dataset updated
Jan 24, 2023
Dataset provided by
Western Pennsylvania Regional Data Center
Description
We’ve been asked to create measures of communities that are “walkable” for several projects. While there is no standard definition of what makes a community “walkable”, and the definition of “walkability” can differ from person to person, we thought an indicator that explores the total length of available sidewalks relative to the total length of streets in a community could be a good place to start. In this blog post, we describe how we used open data from SPC and Allegheny County to create a new measure for how “walkable” a community is. We wanted to create a ratio of the length of a community’s sidewalks to the length of a community’s streets as a measure of pedestrian infrastructure. A ratio of 1 would mean that a community has an equal number of linear feet of sidewalks and streets. A ratio of about 2 would mean that a community has two linear feet of sidewalk for every linear foot of street. In other words, every street has a sidewalk on either side of it. In creating a measure of the ratio of streets to sidewalks, we had to do a little bit of data cleanup. Much of this was by trial and error, ground-truthing the data based on our personal experiences walking in different neighborhoods. Since street data was not shared as open data by many counties in our region either on PASDA or through the SPC open data portal, we limited our analysis of “walkability” to Allegheny County. In looking at the sidewalk data table and map, we noticed that trails were included. While nice to have in the data, we wanted to exclude these two features from the ratio. We did this to avoid a situation where a community that had few sidewalks but was in the same blockgroup as a park with trails would get “credit” for being more “walkable” than it actually is according to our definition. We did this by removing all segments where “Trail” was in the “Type_Name” field. We also used a similar tabular selection method to remove crosswalks from the sidewalk data “Type_Name”=”Crosswalk.” We kept the steps in the dataset along with the sidewalks. In the street data obtained from Allegheny County’s GIS department, we felt like we should try to exclude limited-access highway segments from the analysis, since pedestrians are prohibited from using them, and their presence would have reduced the sidewalk/street ratio in communities where they are located. We did this by excluding street segments whose values in the “FCC” field (designating type of street) equaled “A11” or “A63.” We also removed trails from this dataset by excluding those classified as “H10.” Since documentation was sparse, we looked to see how these features were classified in the data to determine which codes to exclude. After running the data initially, we also realized that excluding alleyways from the calculations also could improve the accuracy of our results. Some of the communities with substantial pedestrian infrastructure have alleyways, and including them would make them appear to be less-”walkable” in our indicator. We removed these from the dataset by removing records with a value of “Aly” or “Way” in the “St_Type” field. We also excluded streets where the word “Alley” appeared in the street name, or “St_Name” field. The full methodology used for this dataset is captured in our blog post, and we have also included the sidewalk and street data used to create the ratio here as well.
NetVotes iKnow Dataset
zenodo.org
data.niaid.nih.gov
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nejat Arınık; Nejat Arınık; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo (2024). NetVotes iKnow Dataset [Dataset]. http://doi.org/10.5281/zenodo.6816076
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6816076
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nejat Arınık; Nejat Arınık; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. This is the data used in the experiment of the following conference paper:

N. Arınık, R. Figueiredo, and V. Labatut, “Signed Graph Analysis for the Interpretation of Voting Behavior,” in International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities, Graz, AT, 2017, vol. 2025. ⟨hal-01583133⟩

Source code. The code source is accessible on GitHub: https://github.com/CompNet/NetVotes

Citation. If you use the data or source code, please cite the above paper.

@InProceedings{Arinik2017,
author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent},
title = {Signed Graph Analysis for the Interpretation of Voting Behavior},
booktitle = {International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities},
year = {2017},
volume = {2025},
series = {CEUR Workshop Proceedings},
address = {Graz, AT},
url = {http://ceur-ws.org/Vol-2025/paper_rssna_1.pdf},
}

----------------------

Details.

# RAW INPUT FILES
The 'itsyourparliament' folder contains all raw input files for further data processing (such as network extraction).
The folder structure is as follows:
* itsyourparliament/
** domains: There are 28 domain files. Each file corresponds to a domain (such as Agriculture, Economy, etc.) and contains corresponding vote identifiers and their "itsyourparliament.eu" links.
** meps: There are 870 Member of Parliament (MEP) files. Each file contains the MEP information (such as name, country, address, etc.)
** votes: There are 7513 vote files. Each file contains the votes expressed by MEPs
# NETWORKS AND CORRESPONDING PARTITIONS
This work studies the voting behavior of French and Italian MEPs on "Agriculture and Rural Development" (AGRI) and "Economic and Monetary Affairs" (ECON) for each separate year of the 7th EP term (2009-10, 2010-11, 2011-12, 2012-13, 2013-14). Note that the interpretation part (section 4) of the published paper is limited to only a few of these instances (2009-10 in ECON and 2012-13 in AGRI).
The extracted networks are located in the "networks" folder and the corresponding partitions are in the "partitions" folder. Both folders have the same structure, which is as follows:
COUNTRY-NAME
|_DOMAIN-NAME
|_2009-10
|_2010-11
|_2011-12
|_2012-13
|_2013-14
## NETWORKS
The networks in this folder are used in the article. All those networks are the ones obtained after the filtering step (as explained in the article). The networks are in 'Graphml' format. These networks are enriched with some MEPs' properties (such as name, political party, etc.) associated with each node.
## ALL NETWORKS
For those who are interested in other countries or domains, we make available all possible networks that we can extract from raw data with vs. without filtering step.
COUNTRY-NAME
|_m3
|_negtr=NA_postr=NA: This folder contains all filtered networks. Note that the filtering step is explained in Section 2.1.2 of the article.
|_bygroup
|_bycountry
|_negtr=0_postr=0: This folder contains all original networks (i.e. no filtering step).
|_bygroup
|_bycountry
## PARTITIONS
The partitions are obtained in this way: First, the Ex-CC (exact) method is run and we denote 'k' for the the number of detected cluster in output. This 'k' value is the reference point in order to run the ILS-RCC (heuristic) method by specifying the number of desired cluster in output. Then, ILS-RCC is run with various values ('k', 'k+1', 'k+2'). All those results are integrated into the initial network graphml files and then converted into gephi format so that this will help dive in the results in interactive way.
Note that we need to handle the absent MEPs in clustering results. Because, those MEPs correspond to isolated nodes in networks. Each isolated node is considered a single cluster node in Ex-CC results. We simply omit those nodes in order to find the 'k' (number of detected cluster) value before running ILS-RCC. Not also that ILS-RCC does not process isolated nodes such that an isolated node can be part of a cluster.

----------------------
# COMPARISON RESULTS
The 'material-stats' folder contains all the comparison results obtained for Ex-CC and ILS-CC. The csv files associated with plots are also provided.
The folder structure is as follows:
* material-stats/
** execTimePerf: The plot shows the execution time of Ex-CC and ILS-CC based on randomly generated complete networks of different size.
** graphStructureAnalysis: The plots show the weights and links statistics for all instances.
** ILS-CC-vs-Ex-CC: The folder contains 4 different comparisons between Ex-CC and ILS-CC: Imbalance difference, number of detected clusters, difference of the number of detected clusters, NMI (Normalized Mutual Information)

----------------------
Funding: Agorantic FR 3621, FMJH Program Gaspard Monge in optimization and operation research (Project 2015-2842H)
f
Data from: Valid Inference Corrected for Outlier Removal
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shuxiao Chen; Jacob Bien (2023). Valid Inference Corrected for Outlier Removal [Dataset]. http://doi.org/10.6084/m9.figshare.9762731.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9762731.v1
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Shuxiao Chen; Jacob Bien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.
f
All scripts required to replicate our analyses are provided in S1 Data.
plos.figshare.com
bz2
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeet Sukumaran; Mark T. Holder; L. Lacey Knowles (2023). All scripts required to replicate our analyses are provided in S1 Data. [Dataset]. http://doi.org/10.1371/journal.pcbi.1008924.s002
Explore at:
bz2Available download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1008924.s002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS Computational Biology
Authors
Jeet Sukumaran; Mark T. Holder; L. Lacey Knowles
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an compressed archive that includes: scripts to simulate data, construct analysis pipelines, and cluster execution job files (delineate-performance-setup/bin); some notes on the parameter space we used (delineate-performance-setup/docs); scripts to collate, compile, and analyze results, as well as generate plots/figures from results data (delineate-performance-results/bin/); CSV/TSV files summarizing each replicate, including (true) parameters as well as inferred parameter values and probablities, as well as metadata such as analysis execution date/time, cluster location, etc. (delineate-performance-results/data/extracts); TSV files containing data simulation/generation logs, including random seeds etc. (delineate-performance-results/data/logs). Note that we omit the full (simulated) data sets due to size (> 5TB). However, these can be easily regenerated in identical detail using the same random seeds for the data generation (given in the “logs”) with the scripts found in the “setup” section above. Note also that the automatically generated logs provided above span a broad variety of studies and analyses, including not only the production runs reported here but also pilot runs, experimental studies, etc. Production run details relevant to this paper can be identified by correlating date/time/cluster with the information found in the “extracts” subdirectory above. (TBZ)
Data from: A Search for Technosignatures Around 31 Sun-like Stars with the...
zenodo.org
datadryad.org
csv
Updated Jun 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jean-Luc Margot; Jean-Luc Margot; Pavlo Pinchuk; Robert Geil; Ryan Lynch; Pavlo Pinchuk; Robert Geil; Ryan Lynch (2022). Data from: A Search for Technosignatures Around 31 Sun-like Stars with the Green Bank Telescope at 1.15–1.73 GHz [Dataset]. http://doi.org/10.5068/d1937j
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5068/d1937j
Dataset updated
Jun 3, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jean-Luc Margot; Jean-Luc Margot; Pavlo Pinchuk; Robert Geil; Ryan Lynch; Pavlo Pinchuk; Robert Geil; Ryan Lynch
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset describes candidate signal detections obtained at the Green Bank Telescope in 2018 and 2019 and reprocessed with the 2020 UCLA SETI Group data processing pipeline.

We conducted a search for technosignatures in April of 2018 and 2019 with the L-band receiver (1.15–1.73 GHz) of the 100 m diameter Green Bank Telescope. These observations focused on regions surrounding 31 Sun-like stars near the plane of the Galaxy. We present the results of our search for narrowband signals in this data set as well as improvements to our data processing pipeline. Specifically, we applied an improved candidate signal detection procedure that relies on the topographic prominence of the signal power, which nearly doubles the signal detection count of some previously analyzed data sets. We also improved the direction-of-origin filters that remove most radio frequency interference (RFI) to ensure that they uniquely link signals observed in separate scans. We performed a preliminary signal injection and recovery analysis to test the performance of our pipeline. We found that our pipeline recovers 93% of the injected signals over the usable frequency range of the receiver and 98% if we exclude regions with dense RFI. In this analysis, 99.73% of the recovered signals were correctly classified as technosignature candidates. Our improved data processing pipeline classified over 99.84% of the ~26 million signals detected in our data as RFI. Of the remaining candidates, 4539 were detected outside of known RFI frequency regions. The remaining candidates were visually inspected and verified to be of anthropogenic nature. Our search compares favorably to other recent searches in terms of end-to-end sensitivity, frequency drift rate coverage, and signal detection count per unit bandwidth per unit integration time.
d
U.S. Community Water Systems Service Boundaries, v1.0.0
search.dataone.org
hydroshare.org
+1more
Updated Dec 30, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SimpleLab; EPIC (2023). U.S. Community Water Systems Service Boundaries, v1.0.0 [Dataset]. https://search.dataone.org/view/sha256%3A59229305d23a6ab6336be773a3ed2c75ac3586a69c775bba3a8e8101834dcc98
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
SimpleLab; EPIC
Area covered

Description
This is a layer of water service boundaries for 44,919 community water systems that deliver tap water to 306.88 million people in the US. This amounts to 97.22% of the population reportedly served by active community water systems and 90.85% of active community water systems. The layer is based on multiple data sources and a methodology developed by SimpleLab and collaborators called a Tiered, Explicit, Match, and Model approach–or TEMM, for short. The name of the approach reflects exactly how the nationwide data layer was developed. The TEMM is composed of three hierarchical tiers, arranged by data and model fidelity. First, we use explicit water service boundaries provided by states. These are spatial polygon data, typically provided at the state-level. We call systems with explicit boundaries Tier 1. In the absence of explicit water service boundary data, we use a matching algorithm to match water systems to the boundary of a town or city (Census Place TIGER polygons). When a water system and TIGER place match one-to-one, we label this Tier 2a. When multiple water systems match to the same TIGER place, we label this Tier 2b. Tier 2b reflects overlapping boundaries for multiple systems. Finally, in the absence of an explicit water service boundary (Tier 1) or a TIGER place polygon match (Tier 2a or Tier 2b), a statistical model trained on explicit water service boundary data (Tier 1) is used to estimate a reasonable radius at provided water system centroids, and model a spherical water system boundary (Tier 3).

Several limitations to this data exist–and the layer should be used with these in mind. First, the case of assigning a Census Place TIGER polygon to multiple systems results in an inaccurate assignment of the same exact area to multiple systems; we hope to resolve Tier 2b systems into Tier 2a or Tier 3 in a future iteration. Second, matching algorithms to assign Census Place boundaries require additional validation and iteration. Third, Tier 3 boundaries have modeled radii stemming from a lat/long centroid of a water system facility; but the underlying lat/long centroids for water system facilities are of variable quality. It is critical to evaluate the "geometry quality" column (included from the EPA ECHO data source) when looking at Tier 3 boundaries; fidelity is very low when geometry quality is a county or state centroid– but we did not exclude the data from the layer. Fourth, missing water systems are typically those without a centroid, in a U.S. territory, or missing population and connection data. Finally, Tier 1 systems are assumed to be high fidelity, but rely on the accuracy of state data collection and maintenance.

All data, methods, documentation, and contributions are open-source and available here: https://github.com/SimpleLab-Inc/wsb.
Data from: Drivers of interspecific spatial segregation in two closely...
data.niaid.nih.gov
search.dataone.org
+1more
zip
Updated Nov 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anne-Sophie Bonnet-Lebrun; Jason Matthiopoulos; Rémi Lemaire-Patin; Tanguy Deville; Robert Barrett; Maria I. Bogdanova; Mark Bolton; Signe Christensen-Dalsgaard; Francis Daunt; Nina Dehnhard; Sébastien Descamps; Kyle Elliott; Kjell Einar Erikstad; Morten Frederiksen; Grant Gilchrist; Mike Harris; Yann Kolbeinsson; Jannie Fries Linnebjerg; Svein-Håkon Lorentsen; Mark L. Mallory; Flemming Merkel; Anders Mosbech; Ellie Owen; Allison Patterson; Isabeau Pratte; Hallvard Strøm; Þorkell Þórarinsson; Sarah Wanless; Norman Ratcliffe (2024). Drivers of interspecific spatial segregation in two closely related seabird species at a pan-Atlantic scale [Dataset]. http://doi.org/10.5061/dryad.5dv41nsf9
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.5dv41nsf9
Dataset updated
Nov 13, 2024
Dataset provided by
McGill University
Environment and Climate Change Canada
Acadia University
British Antarctic Survey
UiT The Arctic University of Norway
Norwegian Polar Institute
Royal Society for the Protection of Birds
University of Glasgow
Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement
UK Centre for Ecology & Hydrology
Aarhus University
Northeast Iceland Nature Research Centre
Norwegian Institute for Nature Research
National Trust for Scotland
Authors
Anne-Sophie Bonnet-Lebrun; Jason Matthiopoulos; Rémi Lemaire-Patin; Tanguy Deville; Robert Barrett; Maria I. Bogdanova; Mark Bolton; Signe Christensen-Dalsgaard; Francis Daunt; Nina Dehnhard; Sébastien Descamps; Kyle Elliott; Kjell Einar Erikstad; Morten Frederiksen; Grant Gilchrist; Mike Harris; Yann Kolbeinsson; Jannie Fries Linnebjerg; Svein-Håkon Lorentsen; Mark L. Mallory; Flemming Merkel; Anders Mosbech; Ellie Owen; Allison Patterson; Isabeau Pratte; Hallvard Strøm; Þorkell Þórarinsson; Sarah Wanless; Norman Ratcliffe
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Aim. Ecologically similar species living in sympatry are expected to segregate to reduce the effects of competition where such resources are limiting. Segregation from heterospecifics commonly occurs in space, but it is often unknown whether segregation has underlying environmental causes. Indeed, species could segregate because of different fundamental environmental requirements (i.e., ‘niche divergence’), because competitive exclusion at sympatric sites can force species to either change the habitat use they would have at allopatric sites (i.e., ‘niche displacement’) or to avoid certain areas, independently of habitat (i.e., ‘spatial avoidance’). Testing these hypotheses requires the comparison between sympatric and allopatric sites. Understanding the competitive mechanisms that underlie patterns of spatial segregation could improve predictions of species responses to environmental change, as competition might exacerbate the effects of environmental change. Location. North Atlantic and Arctic. Taxa. Common guillemots Uria aalge and Brünnich’s guillemots Uria lomvia. Methods. Here, we examine support for these explanations for spatial segregation in two closely-related seabird species, common guillemots (Uria aalge) and Brünnich’s guillemots (U. lomvia). For this, we collated a pan-Atlantic data set of breeding season foraging tracks from 1046 individuals, collected from 20 colonies (8 sympatric and 12 allopatric). These were analysed with habitat models in a spatially transferable framework to compare habitat preferences between species at sympatric and allopatric sites. Results. We found no effect of the distribution of heterospecifics on local habitat preferences of the focal species. We found differences in habitat preferences between species, but these were not sufficient to explain the observed levels of spatial segregation at sympatric sites. Main conclusions. Assuming we did not omit any relevant environmental variables, these results suggest a mix of niche divergence and spatial avoidance produces the observed patterns of spatial segregation. Methods The data consists in common guillemots (Uria aalge) and Brünnich’s guillemots (U. lomvia) GPS tracking data from multiple colonies. Details on data collection methodologies are available in the associated publication in Journal of Biogeography (https://doi.org/10.1111/jbi.15042).
d
Data for: Connectedness and spillover effect between cryptocurrency and...
search.dataone.org
Updated Nov 29, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mu-Yun Mao (2023). Data for: Connectedness and spillover effect between cryptocurrency and financial assets [Dataset]. http://doi.org/10.5061/dryad.r7sqv9shb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.r7sqv9shb
Dataset updated
Nov 29, 2023
Dataset provided by
Dryad Digital Repository
Authors
Mu-Yun Mao
Time period covered
Jan 1, 2023
Description
Cryptocurrencies have quickly become one type of important financial asset. Accordingly, it is important to understand the interaction between cryptocurrency and other financial asset markets. However, previous literature paid less attention to the correlation between the price trend of cryptocurrencies and other financial assets. Using the vector autoregression model, we analyzed price correlation and spillover effect between cryptocurrencies and financial assets between November 2017 and February 2022. The study concludes that stock price has a spillover effect on cryptocurrencies, government bonds, and precious metals. The research results are useful while allocating portfolios or hedge strategies that include cryptocurrencies and financial assets such as stocks, government bonds, and precious metals., The study downloads cryptocurrencies and other financial assets data through the price information website investing.com.Â We wrote a letter to the source website to inquire whether it is possible to use the data for analysis and writing a paper, and received a positive response. We collect price data of the daily transaction. The data period is from November 2017 to February 2022. Cryptocurrencies are traded 24 hours a day. However, stock, bonds, and other financial assets are traded only on business days. We exclude the transaction data of the stock market close day. As a result, we obtain 1087 price data of cryptocurrency and other assets. The return rate is calculated by taking the price to one order difference. Volatility is calculated in the same way as studied by Diebold and Yilmaz (2012), estimated by subtracting the maximum and minimum prices of the day.,
d
City of Tempe 2022 Community Survey Data
catalog.data.gov
data-academy.tempe.gov
+8more
Updated Sep 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2022 Community Survey Data [Dataset]. https://catalog.data.gov/dataset/city-of-tempe-2022-community-survey-data
Explore at:
Dataset updated
Sep 20, 2024
Dataset provided by
City of Tempe
Area covered
Tempe
Description
Description and PurposeThese data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2022):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethodsThe survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and LimitationsThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. This data is the weighted data provided by the ETC Institute, which is used in the final published PDF report.The 2022 Annual Community Survey report is available on data.tempe.gov. The individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Wydale HolmesContact E-Mail (author): wydale_holmes@tempe.govContact (maintainer): Wydale HolmesContact E-Mail (maintainer): wydale_holmes@tempe.govData Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
F
Thai Shopping List OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Thai Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/thai-shopping-list-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Thai Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Thai language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Thai OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Thai text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native Thai people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Thai text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Thai crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Thai language. Your journey to improved language understanding and processing begins here.
a
City of Tempe 2023 Business Survey Data
financial-stability-and-vitality-tempegov.hub.arcgis.com
performance.tempe.gov
+9more
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Business Survey Data [Dataset]. https://financial-stability-and-vitality-tempegov.hub.arcgis.com/datasets/city-of-tempe-2023-business-survey-data
Explore at:
Dataset updated
Feb 22, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
These data include the individual responses for the City of Tempe Annual Business Survey conducted by ETC Institute. These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Business Survey results are used as indicators for city performance measures. The performance measures with indicators from the Business Survey include the following (as of 2023):1. Financial Stability and Vitality5.01 Quality of Business ServicesThe location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.Additional InformationSource: Business SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData DictionaryMethods:The survey is mailed to a random sample of businesses in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used.To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city.Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city.The data are used by the ETC Institute in the final published PDF report.
Aerial Semantic Drone Dataset
kaggle.com
Updated May 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lalu Erfandi Maula Yusnu (2021). Aerial Semantic Drone Dataset [Dataset]. https://www.kaggle.com/nunenuh/semantic-drone/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Lalu Erfandi Maula Yusnu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Aerial Semantic Drone Dataset

The Semantic Drone Dataset focuses on semantic understanding of urban scenes for increasing the safety of autonomous drone flight and landing procedures. The imagery depicts more than 20 houses from nadir (bird's eye) view acquired at an altitude of 5 to 30 meters above the ground. A high-resolution camera was used to acquire images at a size of 6000x4000px (24Mpx). The training set contains 400 publicly available images and the test set is made up of 200 private images.

This dataset is taken from https://www.kaggle.com/awsaf49/semantic-drone-dataset. We remove and add files and information that we needed for our research purpose. We create our tiff files with a resolution of 1200x800 pixel in 24 channel with each channel represent classes that have been preprocessed from png files label. We reduce the resolution and compress the tif files with tiffile python library.

If you have any problem with tif dataset that we have been modified you can contact nunenuh@gmail.com and gaungalif@gmail.com.

This dataset was a copy from the original dataset (link below), we provide and add some improvement in the semantic data and classes. There are the availability of semantic data in png and tiff format with a smaller size as needed.

Semantic Annotation

The images are labelled densely using polygons and contain the following 24 classes:

unlabeled paved-area dirt grass gravel water rocks pool vegetation roof wall window door fence fence-pole person dog car bicycle tree bald-tree ar-marker obstacle conflicting

Directory Structure and Files

> images > labels/png > labels/tiff - class_to_idx.json - classes.csv - classes.json - idx_to_class.json

Included Data

400 training images in jpg format can be found in "aerial_semantic_drone/images"

Dense semantic annotations in png format can be found in "aerial_semantic_drone/labels/png"

Dense semantic annotations in tiff format can be found in "aerial_semantic_drone/labels/tiff"

Semantic class definition in csv format can be found in "aerial_semantic_drone/classes.csv"

Semantic class definition in json can be found in "aerial_semantic_drone/classes.json"

Index to class name file can be found in "aerial_semantic_drone/idx_to_class.json"

Class name to index file can be found in "aerial_semantic_drone/idx_to_class.json"

Contact

aerial@icg.tugraz.at

Citation

If you use this dataset in your research, please cite the following URL: www.dronedataset.icg.tugraz.at

License

The Drone Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:

That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Graz University of Technology) do not accept any responsibility for errors or omissions. That you include a reference to the Semantic Drone Dataset in any work that makes use of the dataset. For research papers or other media link to the Semantic Drone Dataset webpage.

That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain. That all rights not expressly granted to you are reserved by us (Graz University of Technology).
F
Japanese Shopping List OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Japanese Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/japanese-shopping-list-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Japanese Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Japanese language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Japanese OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Japanese text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native Japanese people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Japanese text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Japanese crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Japanese language. Your journey to improved language understanding and processing begins here.
w
My view
data.wu.ac.at
csv, json, xml
Updated Apr 27, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). My view [Dataset]. https://data.wu.ac.at/schema/data_consumerfinance_gov/NGZ5eC1hYTI2
Explore at:
xml, json, csvAvailable download formats
Dataset updated
Apr 27, 2017
Description
Each week we send thousands of consumers' complaints about financial products and services to companies for response. Complaints are listed in the database after the company responds or after they’ve had the complaint for 15 calendar days, whichever comes first.

We publish the consumer’s description of what happened if the consumer opts to share it and after taking steps to remove personal information. See our Scrubbing Standard for more details

We don’t verify all the facts alleged in these complaints, but we take steps to confirm a commercial relationship. We may remove complaints if they don’t meet all of the publication criteria. Data is generally refreshed nightly. Company level information should be considered in context of company size and/or market share.

More about the Consumer Complaint Database | How we use complaint data | Technical documentation
t
City of Tempe 2023 Community Survey Data
data.tempe.gov
data-academy.tempe.gov
+10more
Updated Jan 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). City of Tempe 2023 Community Survey Data [Dataset]. https://data.tempe.gov/maps/cacfb4bb56244552a6587fd2aa3fb06d
Explore at:
Dataset updated
Jan 2, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
These data include the individual responses for the City of Tempe Annual Community Survey conducted by ETC Institute. This dataset has two layers and includes both the weighted data and unweighted data. Weighting data is a statistical method in which datasets are adjusted through calculations in order to more accurately represent the population being studied. The weighted data are used in the final published PDF report.These data help determine priorities for the community as part of the City's on-going strategic planning process. Averaged Community Survey results are used as indicators for several city performance measures. The summary data for each performance measure is provided as an open dataset for that measure (separate from this dataset). The performance measures with indicators from the survey include the following (as of 2023):1. Safe and Secure Communities1.04 Fire Services Satisfaction1.06 Crime Reporting1.07 Police Services Satisfaction1.09 Victim of Crime1.10 Worry About Being a Victim1.11 Feeling Safe in City Facilities1.23 Feeling of Safety in Parks2. Strong Community Connections2.02 Customer Service Satisfaction2.04 City Website Satisfaction2.05 Online Services Satisfaction Rate2.15 Feeling Invited to Participate in City Decisions2.21 Satisfaction with Availability of City Information3. Quality of Life3.16 City Recreation, Arts, and Cultural Centers3.17 Community Services Programs3.19 Value of Special Events3.23 Right of Way Landscape Maintenance3.36 Quality of City Services4. Sustainable Growth & DevelopmentNo Performance Measures in this category presently relate directly to the Community Survey5. Financial Stability & VitalityNo Performance Measures in this category presently relate directly to the Community SurveyMethods:The survey is mailed to a random sample of households in the City of Tempe. Follow up emails and texts are also sent to encourage participation. A link to the survey is provided with each communication. To prevent people who do not live in Tempe or who were not selected as part of the random sample from completing the survey, everyone who completed the survey was required to provide their address. These addresses were then matched to those used for the random representative sample. If the respondent’s address did not match, the response was not used. To better understand how services are being delivered across the city, individual results were mapped to determine overall distribution across the city. Additionally, demographic data were used to monitor the distribution of responses to ensure the responding population of each survey is representative of city population. Processing and Limitations:The location data in this dataset is generalized to the block level to protect privacy. This means that only the first two digits of an address are used to map the location. When they data are shared with the city only the latitude/longitude of the block level address points are provided. This results in points that overlap. In order to better visualize the data, overlapping points were randomly dispersed to remove overlap. The result of these two adjustments ensure that they are not related to a specific address, but are still close enough to allow insights about service delivery in different areas of the city. The weighted data are used by the ETC Institute, in the final published PDF report.The 2023 Annual Community Survey report is available on data.tempe.gov or by visiting https://www.tempe.gov/government/strategic-management-and-innovation/signature-surveys-research-and-dataThe individual survey questions as well as the definition of the response scale (for example, 1 means “very dissatisfied” and 5 means “very satisfied”) are provided in the data dictionary.Additional InformationSource: Community Attitude SurveyContact (author): Adam SamuelsContact E-Mail (author): Adam_Samuels@tempe.govContact (maintainer): Contact E-Mail (maintainer): Data Source Type: Excel tablePreparation Method: Data received from vendor after report is completedPublish Frequency: AnnualPublish Method: ManualData Dictionary
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated May 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2019 [Dataset]. http://doi.org/10.3886/E103500V7
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V7
Dataset updated
May 18, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1991 - 2019
Area covered
United States
Description
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!Version 8 release notes:Adds 2019 dataVersion 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
Ad-hoc statistical analysis: 2020/21 Quarter 2
gov.uk
Updated Sep 11, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ad-hoc statistical analysis: 2020/21 Quarter 2 [Dataset]. https://www.gov.uk/government/statistical-data-sets/ad-hoc-statistical-analysis-202021-quarter-2
Explore at:
Dataset updated
Sep 11, 2020
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Digital, Culture, Media & Sport
Description
This page lists ad-hoc statistics released during the period July - September 2020. These are additional analyses not included in any of the Department for Digital, Culture, Media and Sport’s standard publications.

If you would like any further information please contact evidence@dcms.gov.uk.

July 2020 - DCMS Economic Estimates: Number of businesses and Gross Value Added (GVA) by turnover band (2018)

This analysis considers businesses in the DCMS Sectors split by whether they had reported annual turnover above or below £500 million, at one time the threshold for the Coronavirus Business Interruption Loan Scheme (CBILS). Please note the DCMS Sectors totals here exclude the Tourism and Civil Society sectors, for which data is not available or has been excluded for ease of comparability.

The analysis looked at number of businesses; and total GVA generated for both turnover bands. In 2018, an estimated 112 DCMS Sector businesses had an annual turnover of £500m or more (0.03% of the total DCMS Sector businesses). These businesses generated 35.3% (£73.9bn) of all GVA by the DCMS Sectors.

These are trends are broadly similar for the wider non-financial UK business economy, where an estimated 823 businesses had an annual turnover of £500m or more (0.03% of the total) and generated 24.3% (£409.9bn) of all GVA.

The Digital Sector had an estimated 89 businesses (0.04% of all Digital Sector businesses) – the largest number – with turnover of £500m or more; and these businesses generated 41.5% (£61.9bn) of all GVA for the Digital Sector. By comparison, the Creative Industries had an estimated 44 businesses with turnover of £500m or more (0.01% of all Creative Industries businesses), and these businesses generated 23.9% (£26.7bn) of GVA for the Creative Industries sector.

https://assets.publishing.service.gov.uk/media/5f05e78ce90e0712cc90b6f7/dcms-businesses-turnover-split-by-number-and-gva-2018.xlsx">

https://assets.publishing.service.gov.uk/media/5f05e78ce90e0712cc90b6f7/dcms-businesses-turnover-split-by-number-and-gva-2018.xlsx">Number and Gross Value Added by businesses in DCMS sectors, split by annual turnover, 2018

MS Excel Spreadsheet, 42.5 KB

July 2020 - ONS Opinions and Lifestyle Omnibus Survey, February 2020 Data Module

This analysis shows estimates from the ONS Opinion and Lifestyle Omnibus Survey Data Module, commissioned by DCMS in February 2020. The Opinions and Lifestyles Survey (OPN) is run by the Office for National Statistics. For more information on the survey, please see the https://www.ons.gov.uk/aboutus/whatwedo/paidservices/opinions" class="govuk-link">ONS website.

DCMS commissioned 19 questions to be included in the February 2020 survey relating to the public’s views on a range of data related issues, such as trust in different types of organisations when handling personal data, confidence using data skills at work, understanding of how data is managed by companies and the use of data skills at work.

The high level results are included in the accompanying tables. The survey samples adults (16+) across the whole of Great Britain (excluding the Isles of Scilly).

<a class="govuk-link" target="_s
F
Bahasa Shopping List OCR Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bahasa Shopping List OCR Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/ocr-dataset/bahasa-shopping-list-ocr-image-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
Introducing the Bahasa Shopping List Image Dataset - a diverse and comprehensive collection of handwritten text images carefully curated to propel the advancement of text recognition and optical character recognition (OCR) models designed specifically for the Bahasa language.
Dataset Contain & Diversity:
Containing more than 2000 images, this Bahasa OCR dataset offers a wide distribution of different types of shopping list images. Within this dataset, you'll discover a variety of handwritten text, including sentences, and individual item name words, quantity, comments, etc on shopping lists. The images in this dataset showcase distinct handwriting styles, fonts, font sizes, and writing variations.
To ensure diversity and robustness in training your OCR model, we allow limited (less than three) unique images in a single handwriting. This ensures we have diverse types of handwriting to train your OCR model on. Stringent measures have been taken to exclude any personally identifiable information (PII) and to ensure that in each image a minimum of 80% of space contains visible Bahasa text.
The images have been captured under varying lighting conditions, including day and night, as well as different capture angles and backgrounds. This diversity helps build a balanced OCR dataset, featuring images in both portrait and landscape modes.
All these shopping lists were written and images were captured by native Bahasa people to ensure text quality, prevent toxic content, and exclude PII text. We utilized the latest iOS and Android mobile devices with cameras above 5MP to maintain image quality. Images in this training dataset are available in both JPEG and HEIC formats.
Metadata:
In addition to the image data, you will receive structured metadata in CSV format. For each image, this metadata includes information on image orientation, country, language, and device details. Each image is correctly named to correspond with the metadata.
This metadata serves as a valuable resource for understanding and characterizing the data, aiding informed decision-making in the development of Bahasa text recognition models.
Update & Custom Collection:
We are committed to continually expanding this dataset by adding more images with the help of our native Bahasa crowd community.
If you require a customized OCR dataset containing shopping list images tailored to your specific guidelines or device distribution, please don't hesitate to contact us. We have the capability to curate specialized data to meet your unique requirements.
Additionally, we can annotate or label the images with bounding boxes or transcribe the text in the images to align with your project's specific needs using our crowd community.
License:
This image dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:
Leverage this shopping list image OCR dataset to enhance the training and performance of text recognition, text detection, and optical character recognition models for the Bahasa language. Your journey to improved language understanding and processing begins here.
w
Afrobarometer Survey 1999-2000, Merged Round 1 Data (12 Countries) -...
microdata.worldbank.org
catalog.ihsn.org
Updated Apr 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institute for Democracy in South Africa (IDASA) (2021). Afrobarometer Survey 1999-2000, Merged Round 1 Data (12 Countries) - Botswana, Ghana, Lesotho, Mali, Malawi, Namibia, Nigeria, Tanzania, Uganda, South Africa, Zambia, Zimbabwe [Dataset]. https://microdata.worldbank.org/index.php/catalog/885
Explore at:
Dataset updated
Apr 27, 2021
Dataset provided by
Ghana Centre for Democratic Development (CDD-Ghana)
Institute for Democracy in South Africa (IDASA)
Michigan State University (MSU)
Time period covered
1999 - 2001
Area covered
Nigeria, Botswana, Lesotho, Zimbabwe, Tanzania, Malawi, Namibia, Ghana, Uganda, Mali
Description
Abstract

The Afrobarometer is a comparative series of public attitude surveys that assess African citizen's attitudes to democracy and governance, markets, and civil society, among other topics.

The 12 country datasetis a combined dataset for the 12 African countries surveyed during round 1 of the survey, conducted between 1999-2000 (Botswana, Ghana, Lesotho, Mali, Malawi, Namibia, Nigeria South Africa, Tanzania, Uganda, Zambia and Zimbabwe), plus data from the old Southern African Democracy Barometer, and similar surveys done in West and East Africa.

Geographic coverage

The Round 1 Afrobarometer surveys have national coverage for the following countries: Botswana, Ghana, Lesotho, Malawi, Mali, Namibia, Nigeria, South Africa, Tanzania, Uganda, Zambia, Zimbabwe.

Analysis unit

Individuals

Universe

The sample universe for Afrobarometer surveys includes all citizens of voting age within the country. In other words, we exclude anyone who is not a citizen and anyone who has not attained this age (usually 18 years) on the day of the survey. Also excluded are areas determined to be either inaccessible or not relevant to the study, such as those experiencing armed conflict or natural disasters, as well as national parks and game reserves. As a matter of practice, we have also excluded people living in institutionalized settings, such as students in dormitories and persons in prisons or nursing homes.

What to do about areas experiencing political unrest? On the one hand we want to include them because they are politically important. On the other hand, we want to avoid stretching out the fieldwork over many months while we wait for the situation to settle down. It was agreed at the 2002 Cape Town Planning Workshop that it is difficult to come up with a general rule that will fit all imaginable circumstances. We will therefore make judgments on a case-by-case basis on whether or not to proceed with fieldwork or to exclude or substitute areas of conflict. National Partners are requested to consult Core Partners on any major delays, exclusions or substitutions of this sort.

Kind of data

Sample survey data [ssd]

Sampling procedure

Afrobarometer uses national probability samples designed to meet the following criteria. Samples are designed to generate a sample that is a representative cross-section of all citizens of voting age in a given country. The goal is to give every adult citizen an equal and known chance of being selected for an interview. They achieve this by:

• using random selection methods at every stage of sampling; • sampling at all stages with probability proportionate to population size wherever possible to ensure that larger (i.e., more populated) geographic units have a proportionally greater probability of being chosen into the sample.

The sampling universe normally includes all citizens age 18 and older. As a standard practice, we exclude people living in institutionalized settings, such as students in dormitories, patients in hospitals, and persons in prisons or nursing homes. Occasionally, we must also exclude people living in areas determined to be inaccessible due to conflict or insecurity. Any such exclusion is noted in the technical information report (TIR) that accompanies each data set.

Sample size and design Samples usually include either 1,200 or 2,400 cases. A randomly selected sample of n=1200 cases allows inferences to national adult populations with a margin of sampling error of no more than +/-2.8% with a confidence level of 95 percent. With a sample size of n=2400, the margin of error decreases to +/-2.0% at 95 percent confidence level.

The sample design is a clustered, stratified, multi-stage, area probability sample. Specifically, we first stratify the sample according to the main sub-national unit of government (state, province, region, etc.) and by urban or rural location.

Area stratification reduces the likelihood that distinctive ethnic or language groups are left out of the sample. Afrobarometer occasionally purposely oversamples certain populations that are politically significant within a country to ensure that the size of the sub-sample is large enough to be analysed. Any oversamples is noted in the TIR.

Sample stages Samples are drawn in either four or five stages:

Stage 1: In rural areas only, the first stage is to draw secondary sampling units (SSUs). SSUs are not used in urban areas, and in some countries they are not used in rural areas. See the TIR that accompanies each data set for specific details on the sample in any given country. Stage 2: We randomly select primary sampling units (PSU). Stage 3: We then randomly select sampling start points. Stage 4: Interviewers then randomly select households. Stage 5: Within the household, the interviewer randomly selects an individual respondent. Each interviewer alternates in each household between interviewing a man and interviewing a woman to ensure gender balance in the sample.

To keep the costs and logistics of fieldwork within manageable limits, eight interviews are clustered within each selected PSU.

Data weights For some national surveys, data are weighted to correct for over or under-sampling or for household size. "Withinwt" should be turned on for all national -level descriptive statistics in countries that contain this weighting variable. It is included as the last variable in the data set, with details described in the codebook. For merged data sets, "Combinwt" should be turned on for cross-national comparisons of descriptive statistics. Note: this weighting variable standardizes each national sample as if it were equal in size.

Further information on sampling protocols, including full details of the methodologies used for each stage of sample selection, can be found at https://afrobarometer.org/surveys-and-methods/sampling-principles

Mode of data collection

Face-to-face [f2f]

Research instrument

Because Afrobarometer Round 1 emerged out of several different survey research efforts, survey instruments were not standardized across all countries, there are a number of features of the questionnaires that should be noted, as follows: • In most cases, the data set only includes those questions/variables that were asked in nine or more countries. Complete Round 1 data sets for each individual country have already been released, and are available from ICPSR or from the Afrobarometer website at www.afrobarometer.org. • In the seven countries that originally formed the Southern Africa Barometer (SAB) - Botswana, Lesotho, Malawi, Namibia, South Africa, Zambia and Zimbabwe - a standardized questionnaire was used, so question wording and response categories are the generally the same for all of these countries. The questionnaires in Mali and Tanzania were also essentially identical (in the original English version). Ghana, Uganda and Nigeria each had distinct questionnaires. • This merged dataset combines, into a single variable, responses from across these different countries where either identical or very similar questions were used, or where conceptually equivalent questions can be found in at least nine of the different countries. For each variable, the exact question text from each of the countries or groups of countries ("SAB" refers to the Southern Africa Barometer countries) is listed. • Response options also varied on some questions, and where applicable, these differences are also noted.
Redesigning Modern Portfolio Theory to Improve Spatial Recovery Planning for...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated May 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alicia Canales; Olivia Somhegyi; Jaden Husser; Stephanie Luu (2024). Redesigning Modern Portfolio Theory to Improve Spatial Recovery Planning for Oregon Coast (OC) Coho Salmon [Dataset]. http://doi.org/10.5061/dryad.pvmcvdntm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.pvmcvdntm
Dataset updated
May 28, 2024
Dataset provided by
University of California, Santa Barbara
Authors
Alicia Canales; Olivia Somhegyi; Jaden Husser; Stephanie Luu
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
Oregon Coast, Oregon
Description
Oregon Coast (OC) coho salmon (Oncorhynchus kisutch) are a federally listed threatened species under the Endangered Species Act. It is integral to conserve this species due to their ecological importance in nutrient cycling and cultural significance to Indigenous peoples. The combination of their threatened status and significance creates a sense of urgency for conservation organizations, like the Wild Salmon Center, to efficiently allocate their budgets. In this project we redesigned Modern Portfolio Theory (MPT) to optimize habitat restoration spending. MPT is traditionally used in finance to inform portfolio managers what the risks and returns are of investing in different portfolios of assets. In our redesigned application, the 21 populations of OC coho salmon are treated as assets, with the increase of salmon abundance and variance directly relating to the amount of money allocated to conserve each population. More specifically, we applied our new approach to mitigating barriers that inhibit salmon from traveling back to their natal streams. To do this, we gathered and collected data on fish passage barriers, average project costs to remove barriers, and estimated how removing barriers would affect the abundance of coho salmon. We analyzed portfolios under multiple budgets and scenarios that prioritize conservation spending in watersheds important to Indigenous peoples. This endogenous application is the first of its kind in the conservation field and can be applied to a multitude of species or restoration actions beyond OC coho salmon and barrier mitigation. Methods

Adult Coho Salmon Spawner data: This dataset was given to us by our client, the Wild Salmon Center, but was sourced through the Oregon Adult Salmonid Inventory & Sampling Project run by the Oregon Department of Fish & Widlife (ODFW).
National Hydrography Reaches: All files were given to us by our client the Wild Salmon Center. Each file were given and downloaded as .shp files for each individual population in the OC coho Salmon Evolutionary Significant Unit (ESU). These files were created specifically for the Wild Salmon Center and NOAA by Terrainworks using USGS' National Hydrography Dataset. Many variables are outputs of different climate models ran by Terrainworks using the variables within USGS' orginal hydrography dataset. These .shp files were used to better understand stream habitat within the ESU. Population Boundaries: All files were given to us by our client the Wild Salmon Center as a .shp file with all populations included. Each individual population was then exported to its own .shp file. All population boundaries were used to clip the reach data and barrier data in ArcGIS Pro to only look at ESU specific attributes. Oregon Fish Passage Barriers: This dataset was sourced from ODFW's Fish Barrier Data basin. The barrer data for the state of Oregon was downloaded as a layer package. For our project, the layer package was clipped by each individual population boundary .shp files. This allowed us to isolate the barriers that are within the ESU. Cost Data: The cost data was collected by downloading a .xlsx file from the Oregon Watershed Restoration Inventory and it managed by the the Oregon Watershed Enhancement Board. The inventory contains information on self reported restoration projects that have taken place in Oregon from 1995 - 2022. The file was then filtered by project type. All project types that involved bridges, culverts, dams, waterfalls, tide gates, fords, and cascades were used in our analysis. Each project type was given its own tab in the .xlsx file to calculate an average cost for each barrier type. The average cost was then applied to our barrier data where each barrier type received it's estimated cost to mitgate. This data informed our R model to estimate the impacts of barrier mitgation on adult salmon population returns.

Facebook

Twitter

Click to copy link

Link copied

Cite

Western Pennsylvania Regional Data Center (2023). Sidewalk to Street "Walkability" Ratio [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/sidewalk-to-street-walkability-ratio

Sidewalk to Street "Walkability" Ratio

Explore at:

Dataset updated

Jan 24, 2023

Dataset provided by

Western Pennsylvania Regional Data Center

Description

We’ve been asked to create measures of communities that are “walkable” for several projects. While there is no standard definition of what makes a community “walkable”, and the definition of “walkability” can differ from person to person, we thought an indicator that explores the total length of available sidewalks relative to the total length of streets in a community could be a good place to start. In this blog post, we describe how we used open data from SPC and Allegheny County to create a new measure for how “walkable” a community is. We wanted to create a ratio of the length of a community’s sidewalks to the length of a community’s streets as a measure of pedestrian infrastructure. A ratio of 1 would mean that a community has an equal number of linear feet of sidewalks and streets. A ratio of about 2 would mean that a community has two linear feet of sidewalk for every linear foot of street. In other words, every street has a sidewalk on either side of it. In creating a measure of the ratio of streets to sidewalks, we had to do a little bit of data cleanup. Much of this was by trial and error, ground-truthing the data based on our personal experiences walking in different neighborhoods. Since street data was not shared as open data by many counties in our region either on PASDA or through the SPC open data portal, we limited our analysis of “walkability” to Allegheny County. In looking at the sidewalk data table and map, we noticed that trails were included. While nice to have in the data, we wanted to exclude these two features from the ratio. We did this to avoid a situation where a community that had few sidewalks but was in the same blockgroup as a park with trails would get “credit” for being more “walkable” than it actually is according to our definition. We did this by removing all segments where “Trail” was in the “Type_Name” field. We also used a similar tabular selection method to remove crosswalks from the sidewalk data “Type_Name”=”Crosswalk.” We kept the steps in the dataset along with the sidewalks. In the street data obtained from Allegheny County’s GIS department, we felt like we should try to exclude limited-access highway segments from the analysis, since pedestrians are prohibited from using them, and their presence would have reduced the sidewalk/street ratio in communities where they are located. We did this by excluding street segments whose values in the “FCC” field (designating type of street) equaled “A11” or “A63.” We also removed trails from this dataset by excluding those classified as “H10.” Since documentation was sparse, we looked to see how these features were classified in the data to determine which codes to exclude. After running the data initially, we also realized that excluding alleyways from the calculations also could improve the accuracy of our results. Some of the communities with substantial pedestrian infrastructure have alleyways, and including them would make them appear to be less-”walkable” in our indicator. We removed these from the dataset by removing records with a value of “Aly” or “Way” in the “St_Type” field. We also excluded streets where the word “Alley” appeared in the street name, or “St_Name” field. The full methodology used for this dataset is captured in our blog post, and we have also included the sidewalk and street data used to create the ratio here as well.

Clear search

Close search

Google apps

Main menu

Sidewalk to Street "Walkability" Ratio

NetVotes iKnow Dataset

Data from: Valid Inference Corrected for Outlier Removal

All scripts required to replicate our analyses are provided in S1 Data.

Data from: A Search for Technosignatures Around 31 Sun-like Stars with the...

U.S. Community Water Systems Service Boundaries, v1.0.0

Data from: Drivers of interspecific spatial segregation in two closely...

Data for: Connectedness and spillover effect between cryptocurrency and...

City of Tempe 2022 Community Survey Data

Thai Shopping List OCR Image Dataset

What’s Included

City of Tempe 2023 Business Survey Data

Aerial Semantic Drone Dataset

Aerial Semantic Drone Dataset

Semantic Annotation

Directory Structure and Files

Included Data

Contact

Citation

License

Japanese Shopping List OCR Image Dataset

What’s Included

My view

City of Tempe 2023 Community Survey Data

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Ad-hoc statistical analysis: 2020/21 Quarter 2

July 2020 - DCMS Economic Estimates: Number of businesses and Gross Value Added (GVA) by turnover band (2018)

https://assets.publishing.service.gov.uk/media/5f05e78ce90e0712cc90b6f7/dcms-businesses-turnover-split-by-number-and-gva-2018.xlsx">Number and Gross Value Added by businesses in DCMS sectors, split by annual turnover, 2018

July 2020 - ONS Opinions and Lifestyle Omnibus Survey, February 2020 Data Module

Bahasa Shopping List OCR Image Dataset

What’s Included

Afrobarometer Survey 1999-2000, Merged Round 1 Data (12 Countries) -...

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Redesigning Modern Portfolio Theory to Improve Spatial Recovery Planning for...

Sidewalk to Street "Walkability" RatioSee More Versions

Sidewalk to Street "Walkability" Ratio