ColTrend is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe temporal collocation trends in corpora. The workflow consists of a series of Python scripts, data filters, and visualizers. As input, the workflow takes a .CSV file with data on collocations and their relative frequencies by year of publication extracted from a corpus. As output, it provides a .TSV file containing the same data (or a filtered selection thereof) enriched with four measures that indicate the collocation’s temporal trend in the corpus: (1) the slope (k) of a linear regression model fitted to the frequency data, which indicates whether the frequency of use of the collocation is increasing or declining; (2) the coefficient of determination (R2) of the linear regression model, indicating how linear the change in the collocation’s use is; (3) the ratio (m) of maximum relative frequency and average relative frequency, which indicates peaks in collocation usage; and (4) the coefficient of recent growth (t), which indicates an increased usage of the collocation in the last three years of the observed corpus data. The entry also contains three .CSV files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo). It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted. Please note that the ColTrend workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.
ColEmbed is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe clusters of collocation candidates extracted from corpora. The workflow consists of a series of data filters, embedding processors, and visualizers. As input, the workflow takes a tab-separated file (.TSV/.TAB) with data on collocations extracted from a corpus, along with their relative frequencies by year of publication and other optional values (such as information on temporal trends). The workflow allows the user to select the features which are then used in the workflow to cluster collocation candidates, along with the embeddings generated based on the selected lemmas (either one lemma or both lemmas can be selected, depending on our clustering criteria; for instance, if we wish to cluster adjective+noun candidates based on the similarities of their noun components, we only select the second lemma to be taken into account in embedding generation). The obtained embedding clusters can be visualized and further processed (e.g. by finding the closest neighbors of a reference collocation). The workflow is described in more detail in the accompanying README file. The entry also contains three .TAB files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication and four measures describing their temporal trends; see http://hdl.handle.net/11356/1424 for more details) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo). It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted. Please note that the ColEmbed workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.
This spatial data represents the drainage infrastructure that exists at a depth between 1.5 m and 2.5 m below ground. The colour assigned to this is data is Orange.
This spatial data was created as a result of a 2016 study, using 2014 data, done for the Edmonton area to determine the vulnerable drainage and sewage areas of Edmonton in regards to a 1 in 100 year rainfall event.
Due to the constant changing of subsurface infrastructure (adding, upgrading, etc.) combined with the constant changing definition of a 1 in 100 year rainfall event (based on historic rainfall amounts), this raster file reflects the results of a study done in 2016 and should neither suggest previous year’s vulnerabilities nor future year’s vulnerabilities.
For a more regional Edmonton area breakdown of the Study’s results:
https://www.edmonton.ca/city_government/documents/RoadsTraffic/City-wide_Flood_Mitigation_Study.pdf
There are three different colour to the vulnerability of the roadways and the corresponding ponding depth that would occur for that area during a large rainstorm.
Those colours are:
Green (representing the depth from surface that sanitary flows can surcharge from less than 2.5 m) Yellow (representing the depth from surface that sanitary flows can surcharge from 1.5 to 2.5 m) Red (representing the depth from surface that sanitary flows can surcharge from greater than 1.5 m)
This Raster file is best viewed overlaid with the 2016 Flood Mitigation Study - Drainage and Sanitation Surcharge Map; as the various coloured areas follow the subsurface infrastructure (and the corresponding roadways if you are also viewing the street map as a layer).
Disclaimer: No Warranty with Flood Risk Maps. Your use of the flood risk maps is solely at your own risk, and you are fully responsible for any consequences arising from your use of the flood risk maps. The flood risk maps are provided on an “as is” and “as available” basis, and you agree to use them solely at your own risk. There are no warranties, expressed or implied in respect to the flood risk maps or your use of them, including without limitation, implied warranties and conditions of merchantability and fitness for any particular purpose.
Please note that the flood risk maps have been modified from their original source, and that all data visualization on maps are approximate and include only records that can be mapped.
This dataset is based on 2014 information and will not be updated further. The model is based on a theoretical, worst-case scenario storm that has never occurred in the Edmonton area.
Model Accuracy:
The LiDar used was a 5 meter grid system. LiDar has an accuracy of ? cm horizontally/vertically. Bare Earth LiDar was used in for this model surface.
This is a spline fit interpolations model. This is a 1D-1D model with 2D interpolations.The accuracy of the information provided in these data sets is plus or minus 10 cm vertically, and 10 cm horizontally.
The 100 year flood was based on the 2015 Edmonton 4 year Chicago storm event over 20 plus neighbourhoods. The data is a collection of the worst case scenario of model runs.
This is a common practice for Edmonton drainage models. These models are high level concept and projects determined from this data set will undergo finer, more detailed modeling.
These maps are a visual representation and intended to be used when prioritization of the best engineering solutions that are scheduled to be brought forward to Utility Council to mitigate future flooding in the City. The best engineering solutions are high level concept designs and require further modeling and design. At the time of the PDF release, November 9, 2016 there was no funding for any projects to be completed or for further design. Strategy will be brought forward to Utility Committee on June 7, 2017. Council will be determining funding and rate of project completion.
The Storm size used in these models are larger than Edmonton has historically seen. Historically, as seen in 2004 and 2012, only 4 neighbourhoods at a time were hit with the 100 year rainstorm event. With the continuation of the City-Wide Flood Mitigation Strategy these maps will become obsolete with smaller storms being applied to the area.
Disclaimer: No Warranty with Flood Risk Maps. Your use of the flood risk maps is solely at your own risk, and you are fully responsible for any consequences arising from your use of the flood risk maps. The flood risk maps are provided on an “as is” and “as available” basis, and you agree to use them solely at your own risk. There are no warranties, expressed or implied in respect to the flood risk maps or your use of them, including without limitation, implied warranties and conditions of merchantability and fitness for any particular purpose.
Please note that the flood risk maps have been modified from their original source, and that all data visualization on maps are approximate and include only records that can be mapped.
Geo Coordinate System: WGS84
Catalog of Historic Villages divided between: 'Orange Flag' and 'Most Beautiful Villages in Italy' villages (selected and certified respectively by TCI and the Club Most Beautiful Villages in Italy), 'Historical Seaside Villages' (identified within the project interregional project promoted by MiBACT ex L. 296/2006) and 'Authentic villages' (members of the Association of authentic villages of Italy).-Coverage: Entire Regional Territory - Origin: Georeference on numerical and raster CTR; year: 2023
The original citrus dataset contains 759 images of healthy and unhealthy citrus fruits and leaves. However, for now we only export 594 images of citrus leaves with the following labels: Black Spot, Canker, Greening, and Healthy. The exported images are in PNG format and have 256x256 pixels.
NOTE: Leaf images with Melanose label were dropped due to very small count and other non-leaf images being present in the same directory.
Dataset URL: https://data.mendeley.com/datasets/3f83gxmv57/2 License: http://creativecommons.org/licenses/by/4.0
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('citrus_leaves', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/citrus_leaves-0.1.2.png" alt="Visualization" width="500px">
Not seeing a result you expected?
Learn how you can add new datasets to our index.
ColTrend is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe temporal collocation trends in corpora. The workflow consists of a series of Python scripts, data filters, and visualizers. As input, the workflow takes a .CSV file with data on collocations and their relative frequencies by year of publication extracted from a corpus. As output, it provides a .TSV file containing the same data (or a filtered selection thereof) enriched with four measures that indicate the collocation’s temporal trend in the corpus: (1) the slope (k) of a linear regression model fitted to the frequency data, which indicates whether the frequency of use of the collocation is increasing or declining; (2) the coefficient of determination (R2) of the linear regression model, indicating how linear the change in the collocation’s use is; (3) the ratio (m) of maximum relative frequency and average relative frequency, which indicates peaks in collocation usage; and (4) the coefficient of recent growth (t), which indicates an increased usage of the collocation in the last three years of the observed corpus data. The entry also contains three .CSV files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo). It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted. Please note that the ColTrend workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.