Facebook
TwitterThis dataset was created by Jose Carbonell Capo
Facebook
TwitterThis spreadsheet contains an explanation of what the columns mean in all of the datasets. The titles of tabs correspond to the shortened filenames - each file has one tab.
Facebook
TwitterThe dataset has N=1000 rows and 5 columns. 1000 rows have no missing values on any column.
This table contains variable names, labels, and number of missing values. See the complete codebook for more.
| name | label | n_missing |
|---|---|---|
| lat | NA | 0 |
| long | NA | 0 |
| depth | NA | 0 |
| mag | NA | 0 |
| stations | NA | 0 |
This dataset was automatically described using the codebook R package (version 0.9.2).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Reddit [source]
This dataset provides an in-depth look into learning what communities find important and engaging in the news. With this data, researchers can discover trends related to user engagement and popular topics within subreddits. By examining the “score” and “comms_num” columns, our researchers will be able to pinpoint which topics are most liked, discussed or shared within the various subreddits. Researchers may also gain insights into not only how popular a topic is but how it is growing over time. Additionally, by exploring the body column of our dataset, researchers can understand more about which types of news stories drive conversation within particular subreddits—providing an opportunity for deeper analysis of that subreddit’s diverse community dynamics
The dataset includes eight columns: title, score, id, url, comms_num created**body and timestamp** which can help us identify key insights into user engagement among popular subreddits. With this data we may also determine relationships between topics of discussion and their impact on user engagement allowing us to create a better understanding surrounding issue-based conversations online as well as uncover emerging trends in online news consumption habits
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is useful for those who are looking to gain insight into the popularity and user engagement of specific subreddits. The data includes 8 different columns including title, score, id, url, comms_num, created, body and timestamp. This can provide valuable information about how users view and interact with particular topics across various subreddits.
In this guide we’ll look at how you can use this dataset to uncover trends in user engagement on topics within specific subreddits as well as measure the overall popularity of these topics within a subreddit.
1) Analyzing Score: By analyzing the “score” column you can determine which news stories are popular in a particular subreddit and which ones aren't by looking at how many upvotes each story has received. With this data you will be able to determine trends in what types of stories users preferred within a particular subreddit over time.
2) Analyzing Comms_Num: Similarly to analyzing the score column you can analyze the “comms_num” column to see which news stories had more engagement from users by tracking number of comments received on each post. Knowing these points can provide insight into what types of stories tend to draw more comment activity from users in certain subreddits from one day or an extended period of time such tracking post activity for multiple weeks or months at once 3) Analyzing Body: Additionally by looking at the “body” column for each post researchers can gain a better understanding which kinds of topics/news draw attention among specific Reddit communities.. With that complete picture researchers have access not only to data measuring Reddit buzz but also access topic discussion/comments helping generate further insights into why certain posts might be popular or receive more comments than others
Overallthis dataset provides valuable insights about user engagedment related specifically topics trending accross subsbreddits allowing anyone interested reseraching such things easier way access those insights all one place
- Grouping news topics within particular subreddits and assessing the overall popularity of those topics in terms of scores/user engagement.
- Correlating user engagement with certain news topics to understand how they influence discussion or reactions on a subreddit.
- Examining the potential correlation between score and the actual body content of a given post to assess what types of content are most successful in gaining interest from users and creating positive engagement for posts
If you use this dataset in your research, please credit the original authors.
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: news.csv | Column name | Description ...
Facebook
TwitterDatasets generated for the Physical Review E article with title: "Traveling Bubbles and Vortex Pairs within Symmetric 2D Quantum Droplets" by Paredes, Guerra-Carmenate, Salgueiro, Tommasini and Michinel. In particular, we provide the data needed to generate the figures in the publication, which illustrate the numerical results found during this work.
We also include python code in the file "plot_from_data_for_repository.py" that generates a version of the figures of the paper from .pt data sets. Data can be read and plots can be produced with a simple modification of the python code.
Figure 1: Data are in fig1.csv
The csv file has four columns separated by comas. The four columns correspond to values of r (first column) and the function psi(r) for the three cases depicted in the figure (columns 2-4).
Figures 2 and 4: Data are in data_figs_2_and_4.pt
This is a data file generated with the torch module of python. It includes eight torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the six eigenstates depicted in figures 2 and 4 ("psia", "psib", "psic", "psid", "psie", "psif"). Notice that figure 2 is the square of the modulus and figure 4 is the argument, both are obtained from the same data sets.
Figure 3: Data are in fig3.csv
The csv file has three columns separated by comas. The three columns correspond to values of momentum p (first column), energy E (second column) and velocity U (third column).
Figure 5: Data are in fig5.csv
The csv file has three columns separated by comas. The three columns correspond to values of momentum p (first column), the minimum value of |psi|^2 (second column) and the value of |psi|^2 at the center (third column).
Figure 6: Data are in data_fig_6.pt
This is a data file generated with the torch module of python. It includes six torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the four instants of time depicted in figure 6 ("psia", "psib", "psic", "psid").
Figure 7: Data are in data_fig_7.pt
This is a data file generated with the torch module of python. It includes six torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the four instants of time depicted in figure 7 ("psia", "psib", "psic", "psid").
Figures 8 and 10: Data are in data_figs_8_and_10.pt
This is a data file generated with the torch module of python. It includes eight torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the six eigenstates depicted in figures 8 and 10 ("psia", "psib", "psic", "psid", "psie", "psif"). Notice that figure 8 is the square of the modulus and figure 10 is the argument, both are obtained from the same data sets.
Figure 9: Data are in fig9.csv
The csv file has two columns separated by comas. The two columns correspond to values of momentum p (first column) and energy (second column).
Figure 11: Data are in data_fig_11.pt
This is a data file generated with the torch module of python. It includes ten torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the two cases, four instants of time for each case, depicted in figure 11 ("psia", "psib", "psic", "psid", "psie", "psif", "psig", "psih").
Figure 12: Data are in data_fig_12.pt
This is a data file generated with the torch module of python. It includes eight torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the six instants of time depicted in figure 12 ("psia", "psib", "psic", "psid", "psie", "psif").
Figure 13: Data are in data_fig_13.pt
This is a data file generated with the torch module of python. It includes ten torch tensors for the spatial grid "x" and "y" and for the complex values of psi for the eight instants of time depicted in figure 13 ("psia", "psib", "psic", "psid", "psie", "psif", "psig", "psih").
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes ALL the abundance values, zero and non-zero. Taxonomic groups are diplayed in the 'taxon' column, rather than in separate columns, with abundances in the 'abund_L' column. For the original presentation of the data, see VPR_ashjian_orig. For a version of the data with only non-zero data, see VPR_ashjian_nonzero. In the 'nonzero' dataset, values of 0 in the abund_L column (taxon abundance) have been removed.
Methodology
The following information was extracted from C.J. Ashjian et al., Deep- Sea Research II 48(2001) 245-282 . An in-depth discussion of the data and sampling methods can be found there.
The Video Plankton Recorder was towed at 2 m/s, collecting data from the surface to the bottom (towyo). The VPR was equipped with 2-4 cameras, temperature and conductivity probes, fluorometer and transmissometer. Environmental data was collected at 0.25 Hz (CI9407) or 0.5 Hz (EN259, EN262). Video images were recorded at 60 fields per second (fps).
Video tapes were analyzed for plankton abundances using a semi-automated method discussed in Davis, C.S. et al., Deep-Sea Research II 43 (1996) 1946-1970. In-focus images were extracted from the video tapes and identified by hand to particle type, taxon, or species. Plankton and particle observations were merged with environmental and navigational data by binning the observations for each category into the time intervals at which the environmental data were collected (again see above Davis citation). Concentrations were calculated utilizing the total volume (liters) imaged during that period. For less-abundant categories, usually only a single organism was observed during each time interval so that the resulting concentrations are close to presence or absence data rather than covering a range of values.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List breedingBirdData.txt butterflyData.txt ExampleSession.txt MultiSpeciesSiteOcc.R MultiSpeciesSiteOccModel.txt CumNumSpeciesPresent.R
Description “breedingBirdData.txt” is an example data set in ASCII comma-delimited format. Each row corresponds to data for a single species observed in the avian survey. The 50 columns correspond to 50 sample locations. “butterflyData.txt” is an example data set in ASCII comma-delimited format. Each row corresponds to data for a single species observed in the butterfly survey. The 20 columns correspond to 20 sample locations. “ExampleSession.txt” illustrates an example session in R where the butterfly data are read into memory and then analyzed using the R and WinBUGS code. “MultiSpeciesSiteOcc.R” defines an R function for fitting the model of species occurrence and detection to data. This function specifies a Gibbs sampler wherein 55000 random draws are computed for each of 4 different Markov chains. These computations may require nontrivial execution times. For example, analysis of the avian data required about 4 hours using a computer equipped with a 3.20 GHz Pentium 4 processor. Analysis of the butterfly data required about 1.5 hours. “MultiSpeciesSiteOccModel.txt” contains WinBUGS code for specifying the model of species occurrence and detection. “CumNumSpeciesPresent.R” defines an R function for computing a sample of the posterior-predictive distribution of a species-accumulation curve whose abscissa ranges from 1 to nsites sites.
Facebook
TwitterSCOAPE_Pandora_Data is the column NO2 and ozone data collected by Pandora spectrometers during the Satellite Coastal and Oceanic Atmospheric Pollution Experiment (SCOAPE). Pandora instruments were located on the University of Southern Mississippi’s Research Vessel (R/V) Point Sur and at the Louisiana Universities Marine Consortium (LUMCON; Cocodrie, LA). Data collection for this product is complete.The Outer Continental Shelf Lands Act (OCSLA) requires the US Department of Interior Bureau of Ocean Energy Management (BOEM) to ensure compliance with the US National Ambient Air Quality Standard (NAAQS) so that Outer Continental Shelf (OCS) oil and natural gas (ONG) exploration, development, and production do not significantly impact the air quality of any US state. In 2017, BOEM and NASA entered into an interagency agreement to begin a study to scope out the feasibility of BOEM personnel using a suite of NASA and non-NASA resources to assess how pollutants from ONG exploration, development, and production activities affect air quality. An important activity of this interagency agreement was SCOAPE, a field deployment that took place in May 2019, that aimed to assess the capability of satellite observations for monitoring offshore air quality. The outcomes of the study are documented in two BOEM reports (Duncan, 2020; Thompson, 2020).To address BOEM’s goals, the SCOAPE science team conducted surface-based remote sensing and in-situ measurements, which enabled a systematic assessment of the application of satellite observations, primarily NO2, for monitoring air quality. The SCOAPE field measurements consisted of onshore ground sites, including in the vicinity of LUMCON, as well as those from University of Southern Mississippi’s R/V Point Sur, which cruised in the Gulf of America from 10-18 May 2019. Based on the 2014 and 2017 BOEM emissions inventories as well as daily air quality and meteorological forecasts, the cruise track was designed to sample both areas with large oil drilling platforms and areas with dense small natural gas facilities. The R/V Point Sur was instrumented to carry out both remote sensing and in-situ measurements of NO2 and O3 along with in-situ CH4, CO2, CO, and VOC tracers which allowed detailed characterization of airmass type and emissions. In addition, there were also measurements of multi-wavelength AOD and black carbon as well as planetary boundary layer structure and meteorological variables, including surface temperature, humidity, and winds. A ship-based spectrometer instrument provided remotely-sensed total column amounts of NO2 and O3 for direct comparison with satellite measurements. Ozonesondes and radiosondes were also launched 1-3 times daily from the R/V Point Sur to provide O3 and meteorological vertical profile observations. The ground-based observations, primarily at LUMCON, included spectrometer-measured column NO2 and O3, in-situ NO2, VOCs, and planetary boundary layer structure. A NO2sonde was also mounted on a vehicle with the goal to detect pollution onshore from offshore ONG activities during onshore flow; data were collected along coastal Louisiana from Burns Point Park to Grand Isle to the tip of the Mississippi River delta. The in-situ measurements were reported in ICARTT files or Excel files. The remote sensing data are in either HDF or netCDF files.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Problem description
Pizza
The pizza is represented as a rectangular, 2-dimensional grid of R rows and C columns. The cells within the grid are referenced using a pair of 0-based coordinates [r, c] , denoting respectively the row and the column of the cell.
Each cell of the pizza contains either:
mushroom, represented in the input file as M
tomato, represented in the input file as T
Slice
A slice of pizza is a rectangular section of the pizza delimited by two rows and two columns, without holes. The slices we want to cut out must contain at least L cells of each ingredient (that is, at least L cells of mushroom and at least L cells of tomato) and at most H cells of any kind in total - surprising as it is, there is such a thing as too much pizza in one slice. The slices being cut out cannot overlap. The slices being cut do not need to cover the entire pizza.
Goal
The goal is to cut correct slices out of the pizza maximizing the total number of cells in all slices. Input data set The input data is provided as a data set file - a plain text file containing exclusively ASCII characters with lines terminated with a single ‘ ’ character at the end of each line (UNIX- style line endings).
File format
The file consists of:
one line containing the following natural numbers separated by single spaces:
R (1 ≤ R ≤ 1000) is the number of rows
C (1 ≤ C ≤ 1000) is the number of columns
L (1 ≤ L ≤ 1000) is the minimum number of each ingredient cells in a slice
H (1 ≤ H ≤ 1000) is the maximum total number of cells of a slice
Google 2017, All rights reserved.
R lines describing the rows of the pizza (one after another). Each of these lines contains C characters describing the ingredients in the cells of the row (one cell after another). Each character is either ‘M’ (for mushroom) or ‘T’ (for tomato).
Example
3 5 1 6
TTTTT
TMMMT
TTTTT
3 rows, 5 columns, min 1 of each ingredient per slice, max 6 cells per slice
Example input file.
Submissions
File format
The file must consist of:
one line containing a single natural number S (0 ≤ S ≤ R × C) , representing the total number of slices to be cut,
U lines describing the slices. Each of these lines must contain the following natural numbers separated by single spaces:
r 1 , c 1 , r 2 , c 2 describe a slice of pizza delimited by the rows r (0 ≤ r1,r2 < R, 0 ≤ c1, c2 < C) 1 and r 2 and the columns c 1 and c 2 , including the cells of the delimiting rows and columns. The rows ( r 1 and r 2 ) can be given in any order. The columns ( c 1 and c 2 ) can be given in any order too.
Example
0 0 2 1
0 2 2 2
0 3 2 4
3 slices.
First slice between rows (0,2) and columns (0,1).
Second slice between rows (0,2) and columns (2,2).
Third slice between rows (0,2) and columns (3,4).
Example submission file.
© Google 2017, All rights reserved.
Slices described in the example submission file marked in green, orange and purple. Validation
For the solution to be accepted:
the format of the file must match the description above,
each cell of the pizza must be included in at most one slice,
each slice must contain at least L cells of mushroom,
each slice must contain at least L cells of tomato,
total area of each slice must be at most H
Scoring
The submission gets a score equal to the total number of cells in all slices. Note that there are multiple data sets representing separate instances of the problem. The final score for your team is the sum of your best scores on the individual data sets. Scoring example
The example submission file given above cuts the slices of 6, 3 and 6 cells, earning 6 + 3 + 6 = 15 points.
Facebook
TwitterMeasured Tm values found using R experiments on systems without any abasic sites (columns 2–5) and the system with 7 abasic sites dividing the system into 6 groups of 4 bases (columns 6–9).
Facebook
TwitterThere is a total of 17 datasets to produce all the Figures in the article. There are mainly two different data files: GUP White Dwarf Mass-Radius (GUPWD_M-R) data and GUP White Dwarf Profile (GUPWD_Profile) data.
The file GUPWD_M-R gives only the Mass-Radius relation with Radius (km) in the first column and Mass (solar mass) in the second.
On the other hand GUPWD_Profile provides the complete profile with following columns.
column 1: Dimensionless central Fermi Momentum $\xi_c$ column 2: Central Density $\rho_c$ ( Log10 [$\rho_c$ g cm$^{-3}$] ) column 3: Radius $R$ (km) column 4: Mass $M$ (solar mass) column 5: Square of fundamental frequency $\omega_0^2$ (sec$^{-2}$)
=====================================================================================
Figure 1 (a) gives Mass-Radius (M-R) curves for $\beta_0=10^{42}$, $10^{41}$ and $10^{40}$. The filenames of the corresponding dataset are
GUPWD_M-R[Beta0=E42].dat GUPWD_M-R[Beta0=E41].dat GUPWD_M-R[Beta0...
Facebook
TwitterVersion 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.
To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.
To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.
I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.
As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:
Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Facebook
TwitterSea Scout Hydrographic Survey, H13177 (EM2040). Mainline coverage within the survey area consisted of Complete Coverage (100% side scan sonar with concurrent multibeam data) acquisition. The assigned Fish Haven area and associated debris area were surveyed with Object Detection MBES coverage. Bathymetric and water column data were acquired with a Kongsberg EM2040C multibeam echo sounder aboard the R/V Sea Scout and bathymetry data was acquired with a Kongsberg EM3002 multibeam echo sounder aboard the R/V C-Wolf. Side scan sonar acoustic imagery was collected with a Klein 5000 V2 system aboard the R/V Sea Scout and an EdgeTech 4200 aboard the R/V C-Wolf.
Facebook
TwitterThis dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at PSITAdministration@ChicagoPolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data are updated daily. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data and results from the Imageomics Workflow. These include data files from the Fish-AIR repository (https://fishair.org/) for purposes of reproducibility and outputs from the application-specific imageomics workflow contained in the Minnow_Segmented_Traits repository (https://github.com/hdr-bgnn/Minnow_Segmented_Traits).
Fish-AIR: This is the dataset downloaded from Fish-AIR, filtering for Cyprinidae and the Great Lakes Invasive Network (GLIN) from the Illinois Natural History Survey (INHS) dataset. These files contain information about fish images, fish image quality, and path for downloading the images. The data download ARK ID is dtspz368c00q. (2023-04-05). The following files are unaltered from the Fish-AIR download. We use the following files:
extendedImageMetadata.csv: A CSV file containing information about each image file. It has the following columns: ARKID, fileNameAsDelivered, format, createDate, metadataDate, size, width, height, license, publisher, ownerInstitutionCode. Column definitions are defined https://fishair.org/vocabulary.html and the persistent column identifiers are in the meta.xml file.
imageQualityMetadata.csv: A CSV file containing information about the quality of each image. It has the following columns: ARKID, license, publisher, ownerInstitutionCode, createDate, metadataDate, specimenQuantity, containsScaleBar, containsLabel, accessionNumberValidity, containsBarcode, containsColorBar, nonSpecimenObjects, partsOverlapping, specimenAngle, specimenView, specimenCurved, partsMissing, allPartsVisible, partsFolded, brightness, uniformBackground, onFocus, colorIssue, quality, resourceCreationTechnique. Column definitions are defined https://fishair.org/vocabulary.html and the persistent column identifiers are in the meta.xml file.
multimedia.csv: A CSV file containing information about image downloads. It has the following columns: ARKID, parentARKID, accessURI, createDate, modifyDate, fileNameAsDelivered, format, scientificName, genus, family, batchARKID, batchName, license, source, ownerInstitutionCode. Column definitions are defined https://fishair.org/vocabulary.html and the persistent column identifiers are in the meta.xml file.
meta.xml: A XML file with the metadata about the column indices and URIs for each file contained in the original downloaded zip file. This file is used in the fish-air.R script to extract the indices for column headers.
The outputs from the Minnow_Segmented_Traits workflow are:
sampling.df.seg.csv: Table with tallies of the sampling of image data per species during the data cleaning and data analysis. This is used in Table S1 in Balk et al.
presence.absence.matrix.csv: The Presence-Absence matrix from segmentation, not cleaned. This is the result of the combined outputs from the presence.json files created by the rule “create_morphological_analysis”. The cleaned version of this matrix is shown as Table S3 in Balk et al.
heatmap.avg.blob.png and heatmap.sd.blob.png: Heatmaps of average area of biggest blob per trait (heatmap.avg.blob.png) and standard deviation of area of biggest blob per trait (heatmap.sd.blob.png). These images are also in Figure S3 of Balk et al.
minnow.filtered.from.iqm.csv: Filtered fish image data set after filtering (see methods in Balk et al. for filter categories).
burress.minnow.sp.filtered.from.iqm.csv: Fish image data set after filtering and selecting species from Burress et al. 2017.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Data acquisition was performed using the multibeam echosounder Kongsberg EM122. Raw data are delivered in Kongsberg .wcd format. The data acquisition was part of the international project JPI Oceans - MiningImpact Environmental Impacts and Risks of Deep-Sea Mining.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This article introduces a graphical goodness-of-fit test for copulas in more than two dimensions. The test is based on pairs of variables and can thus be interpreted as a first-order approximation of the underlying dependence structure. The idea is to first transform pairs of data columns with the Rosenblatt transform to bivariate standard uniform distributions under the null hypothesis. This hypothesis can be graphically tested with a matrix of bivariate scatterplots, Q-Q plots, or other transformations. Furthermore, additional information can be encoded as background color, such as measures of association or (approximate) p-values of tests of independence. The proposed goodness-of-fit test is designed as a basic graphical tool for detecting deviations from a postulated, possibly high-dimensional, dependence model. Various examples are given and the methodology is applied to a financial dataset. An implementation is provided by the R package copula. Supplementary material for this article is available online, which provides the R package copula and reproduces all the graphical results of this article.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Important Note: The dataset contains some important information regarding the columns 'title' and 'comments'. It's important to understand their values in order to interpret the data correctly.
In the 'title' column, there may be a significant number of null values. A null value in this column indicates that the corresponding row pertains to a comment rather than a post. To identify the relationship between comment rows and their associated posts, you can examine the 'post_id' column. Rows with the same 'post_id' value refer to comments that are associated with the post identified by that 'post_id'.
Similarly, in the 'comments' column, the presence or absence of null values is crucial for determining whether a row represents a comment or a post. If the 'comments' column is null, it signifies a comment row. Conversely, if the 'comments' column is populated (including cases where the value is 0), it indicates a post row.
Understanding these conventions will enable accurate analysis and interpretation of the dataset.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The United States Geological Survey (USGS) is conducting a study on the effects of climate change on ocean acidification within the Gulf of Mexico; dealing specifically with the effect of ocean acidification on marine organisms and habitats. To investigate this, the USGS participated in two cruises in the West Florida Shelf and northern Gulf of Mexico regions aboard the R/V Weatherbird II, a ship of opportunity lead by Dr. Kendra Daly, of the University of South Florida (USF). The cruises occurred September 20 - 28 and November 2 - 4, 2011. Both left from and returned to Saint Petersburg, Florida, but followed different routes (see Trackline). On both cruises the USGS collected data pertaining to pH, dissolved inorganic carbon (DIC), and total alkalinity in discrete samples. Discrete surface samples were taken during transit approximatly hourly on both cruises, 95 in September were collected over a span of 2127 km, and 7 over a trackline of 732 km line on the November cruise. Along wit ...
Facebook
TwitterThis dataset was created by Jose Carbonell Capo