In 2023, Morningstar Advisor Workstation was by far the most popular data analytics software worldwide. According to a survey carried out between December 2022 and March 2023, the market share of Morningstar Advisor Workstation was 23.81 percent. It was followed by Riskalyze Elite, with 12.21 percent, and YCharts, with 10.82 percent.
This statistic shows the top intelligence applications being used by companies worldwide as of 2018. Around 59 percent of respondents stated that their company was using big data analytics as an intelligence application in 2018.
This GIS layer provides the location where samples from Pellegrino and Hubbard were summarized to provide detailed analysis of 35 common species found in Long Island Sound benthic communities
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
AbstractThe H1B is an employment-based visa category for temporary foreign workers in the United States. Every year, the US immigration department receives over 200,000 petitions and selects 85,000 applications through a random process and the U.S. employer must submit a petition for an H1B visa to the US immigration department. This is the most common visa status applied to international students once they complete college or higher education and begin working in a full-time position. The project provides essential information on job titles, preferred regions of settlement, foreign applicants and employers' trends for H1B visa application. According to locations, employers, job titles and salary range make up most of the H1B petitions, so different visualization utilizing tools will be used in order to analyze and interpreted in relation to the trends of the H1B visa to provide a recommendation to the applicant. This report is the base of the project for Visualization of Complex Data class at the George Washington University, some examples in this project has an analysis for the different relevant variables (Case Status, Employer Name, SOC name, Job Title, Prevailing Wage, Worksite, and Latitude and Longitude information) from Kaggle and Office of Foreign Labor Certification(OFLC) in order to see the H1B visa changes in the past several decades. Keywords: H1B visa, Data Analysis, Visualization of Complex Data, HTML, JavaScript, CSS, Tableau, D3.jsDatasetThe dataset contains 10 columns and covers a total of 3 million records spanning from 2011-2016. The relevant columns in the dataset include case status, employer name, SOC name, jobe title, full time position, prevailing wage, year, worksite, and latitude and longitude information.Link to dataset: https://www.kaggle.com/nsharan/h-1b-visaLink to dataset(FY2017): https://www.foreignlaborcert.doleta.gov/performancedata.cfmRunning the codeOpen Index.htmlData ProcessingDoing some data preprocessing to transform the raw data into an understandable format.Find and combine any other external datasets to enrich the analysis such as dataset of FY2017.To make appropriated Visualizations, variables should be Developed and compiled into visualization programs.Draw a geo map and scatter plot to compare the fastest growth in fixed value and in percentages.Extract some aspects and analyze the changes in employers’ preference as well as forecasts for the future trends.VisualizationsCombo chart: this chart shows the overall volume of receipts and approvals rate.Scatter plot: scatter plot shows the beneficiary country of birth.Geo map: this map shows All States of H1B petitions filed.Line chart: this chart shows top10 states of H1B petitions filed. Pie chart: this chart shows comparison of Education level and occupations for petitions FY2011 vs FY2017.Tree map: tree map shows overall top employers who submit the greatest number of applications.Side-by-side bar chart: this chart shows overall comparison of Data Scientist and Data Analyst.Highlight table: this table shows mean wage of a Data Scientist and Data Analyst with case status certified.Bubble chart: this chart shows top10 companies for Data Scientist and Data Analyst.Related ResearchThe H-1B Visa Debate, Explained - Harvard Business Reviewhttps://hbr.org/2017/05/the-h-1b-visa-debate-explainedForeign Labor Certification Data Centerhttps://www.foreignlaborcert.doleta.govKey facts about the U.S. H-1B visa programhttp://www.pewresearch.org/fact-tank/2017/04/27/key-facts-about-the-u-s-h-1b-visa-program/H1B visa News and Updates from The Economic Timeshttps://economictimes.indiatimes.com/topic/H1B-visa/newsH-1B visa - Wikipediahttps://en.wikipedia.org/wiki/H-1B_visaKey FindingsFrom the analysis, the government is cutting down the number of approvals for H1B on 2017.In the past decade, due to the nature of demand for high-skilled workers, visa holders have clustered in STEM fields and come mostly from countries in Asia such as China and India.Technical Jobs fill up the majority of Top 10 Jobs among foreign workers such as Computer Systems Analyst and Software Developers.The employers located in the metro areas thrive to find foreign workforce who can fill the technical position that they have in their organization.States like California, New York, Washington, New Jersey, Massachusetts, Illinois, and Texas are the prime location for foreign workers and provide many job opportunities. Top Companies such Infosys, Tata, IBM India that submit most H1B Visa Applications are companies based in India associated with software and IT services.Data Scientist position has experienced an exponential growth in terms of H1B visa applications and jobs are clustered in West region with the highest number.Visualization utilizing programsHTML, JavaScript, CSS, D3.js, Google API, Python, R, and Tableau
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘NYC Most Popular Baby Names Over the Years’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/most-popular-baby-names-in-nyce on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Popular Baby Name Data In NYC from 2011-2014
Rows: 13962; Columns: 6
The data include items, such as:
- BRTH_YR: birth year the baby
- GNDR: gender
- ETHCTY: mother's ethnicity
- NM: baby's name
- CNT: count of the name
- RNK: ranking of the name
Source: NYC Open Data
https://data.cityofnewyork.us/Health/Most-Popular-Baby-Names-by-Sex-and-Mother-s-Ethnic/25th-nujf
This dataset was created by Data Society and contains around 10000 samples along with Nm, Rnk, technical information and other features such as: - Gndr - Ethcty - and more.
- Analyze Brth Yr in relation to Cnt
- Study the influence of Nm on Rnk
- More datasets
If you use this dataset in your research, please credit Data Society
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This section presents a discussion of the research data. The data was received as secondary data however, it was originally collected using the time study techniques. Data validation is a crucial step in the data analysis process to ensure that the data is accurate, complete, and reliable. Descriptive statistics was used to validate the data. The mean, mode, standard deviation, variance and range determined provides a summary of the data distribution and assists in identifying outliers or unusual patterns. The data presented in the dataset show the measures of central tendency which includes the mean, median and the mode. The mean signifies the average value of each of the factors presented in the tables. This is the balance point of the dataset, the typical value and behaviour of the dataset. The median is the middle value of the dataset for each of the factors presented. This is the point where the dataset is divided into two parts, half of the values lie below this value and the other half lie above this value. This is important for skewed distributions. The mode shows the most common value in the dataset. It was used to describe the most typical observation. These values are important as they describe the central value around which the data is distributed. The mean, mode and median give an indication of a skewed distribution as they are not similar nor are they close to one another. In the dataset, the results and discussion of the results is also presented. This section focuses on the customisation of the DMAIC (Define, Measure, Analyse, Improve, Control) framework to address the specific concerns outlined in the problem statement. To gain a comprehensive understanding of the current process, value stream mapping was employed, which is further enhanced by measuring the factors that contribute to inefficiencies. These factors are then analysed and ranked based on their impact, utilising factor analysis. To mitigate the impact of the most influential factor on project inefficiencies, a solution is proposed using the EOQ (Economic Order Quantity) model. The implementation of the 'CiteOps' software facilitates improved scheduling, monitoring, and task delegation in the construction project through digitalisation. Furthermore, project progress and efficiency are monitored remotely and in real time. In summary, the DMAIC framework was tailored to suit the requirements of the specific project, incorporating techniques from inventory management, project management, and statistics to effectively minimise inefficiencies within the construction project.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Most common main languages’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from http://data.europa.eu/88u/dataset/14f0e561-78a5-4f47-8ccf-9ae93d37e990-stadt-zurich on 16 January 2022.
--- Dataset description provided by original source is as follows ---
The 50 most common languages of 15-year-olds and elders of the permanent resident population in the city of Zurich. The analysis is based on the pooled target person dataset of the structure survey. Period: 2017 to 2019.
--- Original source retains full ownership of the source dataset ---
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Note: This dataset is the combination of four related datasets that were originally hosted on OpenfMRI.org: ds000113, ds000113b, ds000113c and ds000113d. The combined dataset is now in BIDS format and is simply referred to as ds000113 on OpenNeuro.org.
For more information about the project visit: http://studyforrest.org
This dataset contains high-resolution functional magnetic resonance (fMRI) data from 20 participants recorded at high field strength (7 Tesla) during prolonged stimulation with an auditory feature film ("Forrest Gump''). In addition, a comprehensive set of auxiliary data (T1w, T2w, DTI, susceptibility-weighted image, angiography) as well as measurements to assess technical and physiological noise components have been acquired. An initial analysis confirms that these data can be used to study common and idiosyncratic brain response pattern to complex auditory stimulation. Among the potential uses of this dataset is the study of auditory attention and cognition, language and music perception as well as social perception. The auxiliary measurements enable a large variety of additional analysis strategies that relate functional response patterns to structural properties of the brain. Alongside the acquired data, we provide source code and detailed information on all employed procedures — from stimulus creation to data analysis. (https://www.nature.com/articles/sdata20143)
The dataset also contains data from the same twenty participants while being repeatedly stimulated with a total of 25 music clips, with and without speech content, from five different genres using a slow event-related paradigm. It also includes raw fMRI data, as well as pre-computed structural alignments for within-subject and group analysis.
Additionally, seven of the twenty subjects participated in another study: empirical ultra high-field fMRI data recorded at four spatial resolutions (0.8 mm, 1.4 mm, 2 mm, and 3 mm isotropic voxel size) for orientation decoding in visual cortex — in order to test hypotheses on the strength and spatial scale of orientation discriminating signals. (https://www.sciencedirect.com/science/article/pii/S2352340917302056)
Finally, there are additional acquisitions for fifteen of the the twenty participants: retinotopic mapping, a localizer paradigm for higher visual areas (FFA, EBA, PPA), and another 2 hour movie recording with 3T full-brain BOLD fMRI with simultaneous 1000 Hz eyetracking.
For more information about the project visit: http://studyforrest.org
./sourcedata/acquisition_protocols/04-sT1W_3D_TFE_TR2300_TI900_0_7iso_FS.txt ./sourcedata/acquisition_protocols/05-sT2W_3D_TSE_32chSHC_0_7iso.txt ./sourcedata/acquisition_protocols/06-VEN_BOLD_HR_32chSHC.txt ./sourcedata/acquisition_protocols/07-DTI_high_2iso.txt ./sourcedata/acquisition_protocols/08-field_map.txt Philips-specific MRI acquisition parameters dumps (plain text) for structural MRI (T1w, T2w, SWI, DTI, fieldmap -- in this order)
./sourcedata/acquisition_protocols/task01_fmri_session1.pdf ./sourcedata/acquisition_protocols/task01_fmri_session2.pdf ./sourcedata/acquisition_protocols/angio_session.pdf Siemens-specific MRI acquisition parameters dumps (PDF format) for functional MRI and angiography.
./stimuli/annotations/german_audio_description.csv
Audio-description transcript
This transcript contains all information on the audio-movie content that cannot be inferred from the DVD release — in a plain text, comma-separated-value table. Start and end time stamp, as well as the spoken text are provided for each continuous audio description segment.
./stimuli/annotations/scenes.csv
Movie scenes
A plain text, comma-separated-value table with start and end time for all 198 scenes in the presented movie cut. In addition, each table row contains whether a scene takes place indoors or outdoors.
./stimuli/generate/generate_melt_cmds.py Python script to generate commands for stimuli generation
./stimuli/psychopy/buttons.csv ./stimuli/psychopy/forrest_gump.psyexp ./stimuli/psychopy/segment_cfg.csv Source code of the stimuli presentation in PsychoPy
Prolonged quasi-natural auditory stimulation (Forrest Gump audio movie)
Eight approximately 15 min long recording runs, together comprising the entire duration of a two-hour presentation of an audio-only version of the Hollywood feature film "Forrest Gump" made for a visually impaired audience (German dubbing).
For each run, there are 4D volumetric images (160x160x36)in NIfTI format , one volume recorded every 2 s, obtain from a Siemens MR scanner at 7 Tesla using a T2*-weighted gradient-echo EPI sequence (1.4 mm isotropic voxel size). These images have partial brain coverage — centered on the auditory cortices in both brain hemispheres and include frontal and posterior portions of the brain. There is no coverage for the upper portion of the brain (e.g. large parts of motor and somato-sensory cortices).
Several flavors of raw and preprocessed data are available:
Raw BOLD functional MRI ~~~~~~~~~~~~~~~~~~~~~~~
These raw data suffer from severe geometric distortions.
Filename examples for subject 01 and run 01
./sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-raw_run-01_bold.nii.gz BOLD data
./sourcedata/dicominfo/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-raw_run-01_bold_dicominfo.txt Image property dump from DICOM conversion
Raw BOLD functional MRI (with applied distortion correction) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Identical to raw BOLD data, but with a scanner-side correction for geometric distortions applied (also include correction for participant motion). These data are most suitable for analysis of individual brains.
Filename examples for subject 01 and run 01
./sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-dico_run-01_bold.nii.gz BOLD data
./derivatives/motion/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-dico_run-01_moco_ref.nii.gz Reference volume used for motion correction. Only runs 1 and 5 (first runs in each session)
./sourcedata/dicominfo/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_acq-dico_run-01_bold_dicominfo.txt Image property dump from DICOM conversion
Raw BOLD functional MRI (linear anatomical alignment) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These images are motion and distortion corrected and have been anatomically aligned to a BOLD group template image that was generated from the entire group of participants.
Alignment procedure was linear (image projection using an affine transformation). These data are most suitable for group-analyses and inter-individual comparisons.
Filename examples for subject 01 and run 01
./derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-dico7Tad2grpbold7Tad_run-01_bold.nii.gz BOLD data
./derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-dico7Tad2grpbold7TadBrainMask_run-01_bold.nii.gz Matching brain mask volume
./derivatives/linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-XFMdico7Tad2grpbold7Tad_run-01_bold.mat 4x4 affine transformation matrix (plain text format)
Raw BOLD functional MRI (non-linear anatomical alignment) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
These images are motion and distortion corrected and have been anatomically aligned to a BOLD group template image that was generated from the entire group of participants.
Alignment procedure was non-linear (image projection using an affine transformation with additional transformation by non-linear warpfields). These data are most suitable for group-analyses and inter-individual comparisons.
Filename examples for subject 01 and run 01
./derivatives/non-linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-dico7Tad2grpbold7TadNL_run-01_bold.nii.gz BOLD data
./derivatives/non-linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-dico7Tad2grpbold7TadBrainMaskNLBrainMask_run-01_bold.nii.gz Matching brain mask volume
./derivatives/non-linear_anatomical_alignment/sub-01/ses-forrestgump/func/sub-01_ses-forrestgump_task-forrestgump_rec-dico7Tad2grpbold7TadNLWarp_run-01_bold.nii.gz Warpfield (associated affine transformation is identical with "linear" alignment
Participants were repeatedly stimulated with a total of 25 music clips, with and without speech content, from five different genres using a slow event-related paradigm.
Filename examples for subject 01 and run 01
./sub-01/ses-auditoryperception/func/sub-01_ses-auditoryperception_task-auditoryperception_run-01_bold.nii.gz ./sub-01/ses-auditoryperception/func/sub-01_ses-auditoryperception_task-auditoryperception_run-01_events.tsv
Filename examples for subject 01 and run
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘DSS Township Counts - by Race - CY 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/c079102f-6400-4cb6-8460-6230ca51ee72 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
In order to facilitate public review and access, enrollment data published on the Open Data Portal is provided as promptly as possible after the end of each month or year, as applicable to the data set. Due to eligibility policies and operational processes, enrollment can vary slightly after publication. Please be aware of the point-in-time nature of the published data when comparing to other data published or shared by the Department of Social Services, as this data may vary slightly.
As a general practice, for monthly data sets published on the Open Data Portal, DSS will continue to refresh the monthly enrollment data for three months, after which time it will remain static. For example, when March data is published the data in January and February will be refreshed. When April data is published, February and March data will be refreshed, but January will not change. This allows the Department to account for the most common enrollment variations in published data while also ensuring that data remains as stable as possible over time. In the event of a significant change in enrollment data, the Department may republish reports and will notate such republication dates and reasons accordingly. In March 2020, Connecticut opted to add a new Medicaid coverage group: the COVID-19 Testing Coverage for the Uninsured. Enrollment data on this limited-benefit Medicaid coverage group is being incorporated into Medicaid data effective January 1, 2021. Enrollment data for this coverage group prior to January 1, 2021, was listed under State Funded Medical. An historical accounting of enrollment of the specific coverage group starting in calendar year 2020 will also be published separately. DSS CY 2020 Town counts - Number of people enrolled in DSS services in the calendar year 2020, by township and race. NOTE: On April 22, 2019 the methodology for determining HUSKY A Newborn recipients changed, which caused an increase of recipients for that benefit starting in October 2016. We now count recipients recorded in the ImpaCT system as well as in the HIX system for that assistance type, instead using HIX exclusively. Also, the methodology for determining the address of the recipients changed: 1. The address of a recipient in the ImpaCT system is now correctly determined specific to that month instead of using the address of the most recent month. This resulted in some shuffling of the recipients among townships starting in October 2016. 2. If, in a given month, a recipient has benefit records in both the HIX system and in the ImpaCT system, the address of the recipient is now calculated as follows to resolve conflicts: Use the residential address in ImpaCT if it exists, else use the mailing address in ImpaCT if it exists, else use the address in HIX. This resulted in a reduction in counts for most townships starting in March 2017 because a single address is now used instead of two when the systems do not agree. NOTE: On February 14 2019, the enrollment counts for 2012-2015 across all programs were updated to account for an error in the data integration process. As a result, the count of the number of people served increased by 13% for 2012, 10% for 2013, 8% for 2014 and 4% for 2015. Counts for 2016, 2017 and 2018 remain unchanged. NOTE: On 1/16/2019 these counts were revised to count a recipient in all locations that recipient resided in that year. NOTE: On 1/1/2019 the counts were revised to count a recipient in only one town per year even when the recipient moved within the year. The most recent address is used. (But this was reversed later, see above.)
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘DSS Township Counts - by Age - CY 2020’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/db8bcc6e-3da2-40d1-9443-70876ff1af30 on 26 January 2022.
--- Dataset description provided by original source is as follows ---
In order to facilitate public review and access, enrollment data published on the Open Data Portal is provided as promptly as possible after the end of each month or year, as applicable to the data set. Due to eligibility policies and operational processes, enrollment can vary slightly after publication. Please be aware of the point-in-time nature of the published data when comparing to other data published or shared by the Department of Social Services, as this data may vary slightly.
As a general practice, for monthly data sets published on the Open Data Portal, DSS will continue to refresh the monthly enrollment data for three months, after which time it will remain static. For example, when March data is published the data in January and February will be refreshed. When April data is published, February and March data will be refreshed, but January will not change. This allows the Department to account for the most common enrollment variations in published data while also ensuring that data remains as stable as possible over time. In the event of a significant change in enrollment data, the Department may republish reports and will notate such republication dates and reasons accordingly. In March 2020, Connecticut opted to add a new Medicaid coverage group: the COVID-19 Testing Coverage for the Uninsured. Enrollment data on this limited-benefit Medicaid coverage group is being incorporated into Medicaid data effective January 1, 2021. Enrollment data for this coverage group prior to January 1, 2021, was listed under State Funded Medical. An historical accounting of enrollment of the specific coverage group starting in calendar year 2020 will also be published separately. DSS CY 2020 Town counts - Number of people enrolled in DSS services in the calendar year 2020, by township and age. For privacy considerations, a count of zero is used for counts less than five. A recipient is counted in all townships where that recipient resided in that year.
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for content analysis published in "Hornikx, J., Meurs, F. van, Janssen, A., & Heuvel, J. van den (2020). How brands highlight country of origin in magazine advertising: A content analysis. Journal of Global Marketing, 33 (1), 34-45."*Abstract (taken from publication)Aichner (2014) proposes a classification of ways in which brands communicate their country of origin (COO). The current, exploratory study is the first to empirically investigate the frequency with which brands employ such COO markers in magazine advertisements. An analysis of about 750 ads from the British, Dutch, and Spanish editions of Cosmopolitan showed that the prototypical ‘made in’ marker was rarely used, and that ‘COO embedded in company name’ and ‘use of COO language’ were most frequently employed. In all, 36% of the total number of ads contained at least one COO marker, underlining the importance of the COO construct.*Methodology (taken from publication)SampleThe use of COO markers in advertising was examined in print advertisements from three different countries to increase the robustness of the findings. Given the exploratory nature of this study, two practical selection criteria guided our country choice: the three countries included both smaller and larger countries in Europe, and they represented languages that the team was familiar with in order to reliably code the advertisements on the relevant variables. The three European countries selected were the Netherlands, Spain, and the United Kingdom. The dataset for the UK was discarded for testing H1 about the use of English as a foreign language, as will be explained in more detail in the coding procedure.The magazine Cosmopolitan was chosen as the source of advertisements. The choice for one specific magazine title reduces the generalizability of the findings (i.e., limited to the corresponding products and target consumers), but this magazine was chosen intentionally because an informal analysis suggested that it carried advertising for a large number of product categories that are considered ethnic products, such as cosmetics, watches, and shoes (Usunier & Cestre, 2007). This suggestion was corroborated in the main analysis: the majority of the ads in the corpus referred to a product that Usunier and Cestre (2007) classify as ethnic products. Table 2 provides a description of the product categories and brands referred to in the advertisements. Ethnic products have a prototypical COO in the minds of consumers (e.g., cosmetics – France), which makes it likely that the COOs are highlighted through the use of COO markers.Cosmopolitan is an international magazine that has different local editions in the three countries. The magazine, which is targeted at younger women (18–35 years old), reaches more than three million young women per month through its online, social and print platforms in the Netherlands (Hearst Netherlands, 2016), has about 517,000 readers per month in Spain (PrNoticias, 2016) and about 1.18 million readers per month in the UK (Hearst Magazine U.K., 2016).The sample consisted of all advertisements from all monthly issues that appeared in 2016 in the three countries. This whole-year cluster was selected so as to prevent potential seasonal influences (Neuendorf, 2002). In total, the corpus consisted of 745 advertisements, of which 111 were from the Dutch, 367 from the British and 267 from the Spanish Cosmopolitan. Two categories of ads were excluded in the selection process: (1) advertisements for subscription to Cosmopolitan itself, and (2) advertisements that were identical to ads that had appeared in another issue in one of the three countries. As a result, each advertisement was unique.Coding procedureFor all advertisements, four variables were coded: product type, presence of types of COO markers, COO referred to, and the use of English as a COO marker. In the first place, product type was assessed by the two coders. Coders classified each product to one of the 32 product types. In order to assess the reliability of the codings, ten per cent of the ads were independently coded by a second coder. The interrater reliability of the variable product category was good (κ = .97, p < .000, 97.33% agreement between both coders). Table 2 lists the most frequent product types; the label ‘other’ covers 17 types of product, including charity, education, and furniture.In the second place, it was recorded whether one or more of the COO markers occurred in a given ad. In the third place, if a marker was identified, it was assessed to which COO the markers referred. Table 1 lists the nine possible COO markers defined by Aichner (2014) and the COOs referred to, with examples taken from the current content analysis. The interrater reliability for the type of COO marker was very good (κ = .80, p < .000, 96.30% agreement between the coders), and the interrater reliability for COO referred to was excellent (κ = 1.00, p < .000).After the independent assessments of the two coders, the coders decided on the best coding for all cases for which they made a different initial choice. On the basis of these resulting codings, the fourth and final variable was assessed: the English language as a COO marker. Only if an ad contained the English language and at least one other type of COO marker referring to an English-speaking country, was the English language coded as a true COO marker. An example is a Dutch ad using the English language and featuring a British model. If, as in most cases, an ad contained the English language but no other marker was found that referred to an English-speaking country, the English language was not considered to be a COO marker but a marker of globalness (e.g., ‘Because sometimes, a girl’s gotta walk’ in an ad for Skechers in the Spanish corpus). This procedure to disentangle the English language as a true COO marker and a marker of globalness was only followed in the Dutch and Spanish sample. In the UK sample, the English language was not considered to be either a COO marker or a marker of globalness since English is the first language of the UK. Similarly, neither the Dutch language in the Dutch sample nor the Spanish language in the Spanish sample were considered COO markers since these languages are both countries’ first language.Statistical treatmentFor all research questions and the hypothesis, descriptive statistics were generated presenting frequencies and percentages of the categories that were compared. The first analysis (RQ1) concerned the frequency with which the different types of COO marker were used in the sample from the three different countries. For each COO marker, it was determined whether or not it occurred in each of the ads in the sample. In order to statistically test whether some types of COO marker occur more frequently than others (RQ2a), a within-subject ANOVA was conducted with Type of COO marker as independent variable, with nine levels representing the nine different COO markers classified by Aichner (2014). For RQ2b, RQ2c, and H1, frequencies were compared for the occurrence of the different categories within one variable under investigation. For RQ2c, for instance, the variable was the number of COO markers referred to in an ad; the different categories were ‘no marker’, ‘two markers’, ‘three markers’, and ‘four markers’. Non-parametric 2 tests were conducted for the research questions and the hypothesis to test for potentially significant differences between the occurrence of the categories.
The scientific community has entered an era of big data. However, with big data comes big responsibilities, and best practices for how data are contributed to databases have not kept pace with the collection, aggregation, and analysis of big data. Here, we rigorously assess the quantity of data for specific leaf area (SLA) available within the largest and most frequently used global plant trait database, the TRY Plant Trait Database, exploring how much of the data were applicable (i.e., original, representative, logical, and comparable) and traceable (i.e., published, cited, and consistent). Over three-quarters of the SLA data in TRY either lacked applicability or traceability, leaving only 22.9% of the original data usable compared to the 64.9% typically deemed usable by standard data cleaning protocols. The remaining usable data differed markedly from the original for many species, which led to altered interpretation of ecological analyses. Though the data we consider here make up onl..., SLA data was downlaoded from TRY (traits 3115, 3116, and 3117) for all conifer (Araucariaceae, Cupressaceae, Pinaceae, Podocarpaceae, Sciadopityaceae, and Taxaceae), Plantago, Poa, and Quercus species. The data has not been processed in any way, but additional columns have been added to the datset that provide the viewer with information about where each data point came from, how it was cited, how it was measured, whether it was uploaded correctly, whether it had already been uploaded to TRY, and whether it was uploaded by the individual who collected the data., , There are two additional documents associated with this publication. One is a word document that includes a description of each of the 120 datasets that contained SLA data for the four plant groups within the study (conifers, Plantago, Poa, and Quercus). The second is an excel document that contains the SLA data that was downloaded from TRY and all associated metadata.
Missing data codes: NA and N/A
https://www.enterpriseappstoday.com/privacy-policyhttps://www.enterpriseappstoday.com/privacy-policy
Cloud Security Statistics: Cloud computing can bring many benefits to companies. However, they are also susceptible to being ruined because of the inability to ensure the proper security of information and privacy protections when using cloud computing. This in turn results in higher costs and potential losses to businesses. We will explore more details regarding Cloud Security Statistics in this report. Cloud adoption has risen dramatically over the last few years. Although many organizations were already in the cloud the COVID-19 outbreak has helped accelerate this transition. With the widespread use of remote work, organizations are required to provide support and essential services to their remote workforce. In the end, more than 90% of companies employ some form of cloud-based infrastructure. In addition, more than three-quarters (76 percent) are using multi-cloud deployments made up of at least two cloud service providers. These cloud environments host crucial applications for business and also protect sensitive customer and company information. With the shift to cloud computing comes an increased necessity to collect Cloud Security Statistics. Cloud-hosted applications need to be secured against attacks and cloud-hosted information must be secured against unauthorized access as per the applicable laws. Cloud environments are in a significant way from the on-prem infrastructure this means that the traditional security tools and methods don't always work when working in the cloud. In the end, many companies are confronted with major issues when it comes to securing their cloud-based infrastructure. Editor’s Choice 60% of global corporate data are stored on the cloud. 94% of businesses globally use one or more cloud computing services. It is estimated that the global Cloud Security Statistics market is projected to expand from $480 billion in 2022 to $2.297 trillion by 2032. With 32 percent, Amazon AWS owns the largest market share in cloud computing. 39% of businesses said they've been the victim of data breaches in their cloud environments. The amount of public money spent on cloud computing services is forecast to hit $597.3 billion by 2023. This will increase by 21.7 percent. 92% of companies have embraced a multi-cloud strategy. The market for cloud-based technology is predicted to reach $ 864 billion in 2025. It is expected to grow at an annual rate of 12.8 percent per year. Global storage of data will be greater than 200 Zettabytes of data by 2025. In 2025, more than 100 zettabytes of data are expected to remain in cloud storage. (Cloudwards) 89% of businesses have a multi-cloud strategy. (Flexera) 71 percent of Americans use cloud storage such as Dropbox as well as iCloud. (Statista) 48% of data from companies is stored in the cloud. (Panda Security) The market for cloud computing by 2020 is $371.4 billion. (Globe Newswire) Spending by end-users worldwide on public cloud services is expected to increase by 23.1 percent in 2021. (Gartner) With 83% of cloud users, security is the most frequent issue in cloud adoption. (Cloudwards) 52% of businesses want cloud-based solutions that include security tools. (Cloudwards)
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Ixodid (hard) ticks play important ecosystem roles and have significant impacts on animal and human health via tick-borne diseases and physiological stress from parasitism. Tick occurrence, abundance, behavior, and key life-history traits are highly influenced by host availability, weather, microclimate, and landscape features. As such, changes in the environment can have profound impacts on ticks, their hosts, and the spread of diseases. Researchers interested in enumerating questing ticks attempt to integrate this heterogeneity by conducting replicate sampling bouts spread over the tick questing period as common field methods notoriously underestimate ticks. However, it is unclear how (or if) tick studies account for this heterogeneity in the modeling process. This step is critical as unaccounted variance in detection can lead to biased estimates of occurrence and abundance. We performed a descriptive review to evaluate the extent to which studies account for the detection process while modeling tick data. We also categorized the types of analyses that are commonly used to model tick data. We used hierarchical models (HMs) that account for imperfect detection to analyze simulated and empirical tick data, demonstrating that inference is muddled when detection probability is not accounted for in the modeling process. Our review indicates that only 5 of 412 (1%) papers explicitly accounted for imperfect detection while modeling ticks. By comparing HMs with the most common approaches used for modeling tick data (e.g., ANOVA), we show that population estimates are biased low for simulated and empirical data when using non-HMs, and that confounding occurs due to not explicitly modeling factors that influenced both detection and abundance. Our review and analysis of simulated and empirical data shows that it is important to account for our ability to detect ticks using field methods with imperfect detection. Not doing so leads to biased estimates of occurrence and abundance which could complicate our understanding of parasite-host relationships and the spread of tick-borne diseases. We highlight the resources available for learning HM approaches and applying them to analyzing tick data. Methods Methods To illustrate the problems that arise from not accounting for the detection process while estimating tick abundance, we performed two simulations that mirrored tick dragging studies and used common statistical frameworks for modeling tick data. For both simulations, we chose 5 temporal replicate surveys of 100 plots and specified a positive relationship of temperature on abundance and detection probability; average abundance (λ) was arbitrarily set to 20 ticks. Our choice of replicate surveys is a common field design for studying ticks (Dobson, 2013), and environmental factors such as temperature influence tick abundance and activity (Gilbert, 2021; Klarenberg and Wisely, 2019) and are often used to model tick abundance. For our first simulation, we specified low detection probability (ρ = 0.2) as tick dragging surveys often only collect ~10–20% of questing ticks (Drew and Samuel, 1985; Nyrhilä et al., 2020). We assumed perfect detection (ρ = 1) for our second simulation, meaning that all ticks would be captured by dragging or flagging surveys if they were present. We simulated count data arising from a negative binomial distribution using the 'simNmix' function from AHMbook R package (Kéry et al., 2022) as tick abundance data often have a high variance-to-mean ratio due to aggregated and high counts (Elston et al., 2001). Following simulations, we estimated abundance and evaluated relationships between average seasonal temperature and tick abundance using 3 common approaches for modeling tick count data (linear models [LM], generalized linear models [GLM], and generalized linear mixed-effects models [GLMM]). For the LM analysis, we added 1 to the tick counts and log-transformed counts to meet assumptions of normality. This approach, although problematic, is a standard method to force count data into a linear modeling framework (O’Hara and Kotze, 2010) and common in tick studies (e.g., Allen et al., 2019). For the GLM and GLMM analyses, we used the raw counts and specified models with negative binomial errors. We used the ‘lm’ and ‘glm’ functions in the base R package for the LM and GLM analyses, respectively, and the ‘glmmTMB’ function in the glmmTMB R package (Brooks et al., 2017) for the GLMM analysis. To highlight the shortcomings of the preceding analytical approaches, we compared inference with an N-mixture or binomial mixture model – a type of HM that is often used for estimating abundance when count data are imperfectly detected (Kéry and Royle, 2015). We fit the N-mixture model using the ‘pcount’ function in the unmarked R package (Fiske and Chandler, 2011) and specified average temperature across all sampling occasions as a covariate on abundance (λ) and temperature during each sampling occasion as a covariate on detection probability (ρ). We then evaluated how well each statistical approach recovered average abundance (λ = 20) and relationships between average seasonal temperature and abundance when detection is assumed to be imperfect (ρ = 0.2) and perfect (ρ = 1). All simulations and statistical analyses were performed using R software (R Core Team, 2022), and predictive plots were created using the ggplot2 R package (Wickham, 2016). References Allen, D., Borgmann-Winter, B., Bashor, L., Ward, J., 2019. The density of the Lyme disease vector Ixodes scapularis (blacklegged tick) differs between the Champlain Valley and Green Mountains, Vermont. Northeast. Nat. 26, 545–560. https://doi.org/10.1656/045.026.0307 Brooks, M.E., Kristensen, K., van Benthem, K.J., Magnusson, A., Berg, C.W., Nielsen, A., Skaug, H.J., Mächler, M., Bolker, B.M., 2017. glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R J. 9, 378–400. https://doi.org/10.32614/rj-2017-066 Dobson, A.D.M., 2013. Ticks in the wrong boxes: Assessing error in blanket-drag studies due to occasional sampling. Parasites and Vectors 6, 1–6. https://doi.org/10.1186/1756-3305-6-344 Drew, M.L., Samuel, W.M., 1985. Factors affecting transmission of larval winter ticks, Dermacentor albipictus (Packard), to moose, Alces alces L., in Alberta, Canada. J. Wildl. Dis. 21, 274–282. https://doi.org/10.7589/0090-3558-21.3.274 Elston, D.A., Moss, R., Boulinier, T., Arrowsmith, C., Lambin, X., 2001. Analysis of aggregation, a worked example: Numbers of ticks on red grouse chicks. Parasitology 122, 563–569. https://doi.org/10.1017/S0031182001007740 Gilbert, L., 2021. The impacts of climate change on ticks and tick-borne disease risk. Annu. Rev. Entomol. 66, 273–288. https://doi.org/10.1146/annurev-ento-052720-094533 Kéry, Marc, Royle, J.A., Meredith, M., 2022. AHMbook: Functions and Data for the Book “Applied Hierarchical Modeling in Ecology” Vols 1 and 2. Kéry, M., Royle, J.A., 2015. Applied hierarchical modeling in ecology: Analysis of distribution, abundance, and species richness in R an BUGS. Academic Press. https://doi.org/https://doi.org/10.1016/C2015-0-04070-9 Klarenberg, G., Wisely, S.M., 2019. Evaluation of NEON data to model spatio-temporal tick dynamics in Florida. Insects 10. https://doi.org/10.3390/insects10100321 Nyrhilä, S., Sormunen, J.J., Mäkelä, S., Sippola, E., Vesterinen, E.J., Klemola, T., 2020. One out of ten: low sampling efficiency of cloth dragging challenges abundance estimates of questing ticks. Exp. Appl. Acarol. 82, 571–585. https://doi.org/10.1007/s10493-020-00564-5 O’Hara, R.B., Kotze, D.J., 2010. Do not log-transform count data. Methods Ecol. Evol. 1, 118–122. https://doi.org/10.1111/j.2041-210x.2010.00021.x R Core Team, 2022. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. R Found. Stat. Comput. Vienna, Austria. URL http//www.R-project.org/. Wickham, H., 2016. ggplot2: Elegant graphics for data analysis, First Ed. ed. Springer, New York, NY. https://doi.org/https://doi.org/10.1007/978-0-387-98141-3
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
The global data center rack power distribution unit (PDU) market size was valued at USD 674.52 million in 2025 and is projected to expand at a compound annual growth rate (CAGR) of 9.80% from 2025 to 2033, reaching USD 1,457.28 million by 2033. The market is driven by the growing demand for data centers due to the increasing volume of data generated and processed worldwide. Additionally, the adoption of cloud computing, big data, and artificial intelligence (AI) is also contributing to the growth of the market. The in-row deployment mode segment accounted for the largest share of the market in 2025 and is projected to continue to dominate the market during the forecast period. This is due to the fact that in-row PDUs provide better cooling and power distribution efficiency than other deployment modes. The 1 phase form factor segment is also expected to witness significant growth during the forecast period, owing to the increasing adoption of single-phase power supplies in data centers. The 120V voltage level segment is projected to hold a substantial share of the market during the forecast period. This is due to the fact that 120V is the most common voltage level used in data centers. The 0-5kW power capacity segment is also expected to witness significant growth during the forecast period, owing to the increasing adoption of small and medium-sized data centers. The data center rack power distribution unit (PDU) market is expected to grow from $2.3 billion in 2021 to $3.6 billion by 2026, at a 9.4% CAGR. The growth of the market is attributed to the increasing demand for data centers and the need for efficient power distribution systems. Key drivers for this market are: Growing demand for cloud computing, Edge computing;increasing need for reliable and efficient power distribution; rise in adoption of high-density servers; expansion of data centers. Potential restraints include: Increased cloud computing, AI-adoption; growing data center infrastructure; rising demand for energy efficiency; technological advancements.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The aim of this paper is to investigate the re-use of research data deposited in digital data archive in the social sciences. The study examines the quantity, type, and purpose of data downloads by analyzing enriched user log data collected from Swiss data archive. The findings show that quantitative datasets are downloaded increasingly from the digital archive and that downloads focus heavily on a small share of the datasets. The most frequently downloaded datasets are survey datasets collected by research organizations offering possibilities for longitudinal studies. Users typically download only one dataset, but a group of heavy downloaders form a remarkable share of all downloads. The main user group downloading data from the archive are students who use the data in their studies. Furthermore, datasets downloaded for research purposes often, but not always, serve to be used in scholarly publications. Enriched log data from data archives offer an interesting macro level perspective on the use and users of the services and help understanding the increasing role of repositories in the social sciences. The study provides insights into the potential of collecting and using log data for studying and evaluating data archive use.
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Colorectal Cancer Molecular Diagnostics Kit market represents a significant sector within the healthcare industry, playing a crucial role in the early detection and management of one of the most common cancers worldwide. These diagnostic kits utilize advanced molecular techniques to identify genetic mutations an
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.
Introduction
Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme
and lme4
R packages). We will be covering a few simple applications of time series analysis in these lessons.
Opportunities
Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:
Can we forecast conditions in the future?
Challenges
Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:
Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).
Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.
Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.
Heteroscedasticity: The variance of the time series is not constant over time.
Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.
Learning Objectives
After successfully completing this notebook, you will be able to:
Choose appropriate time series analyses for trend detection and forecasting
Discuss the influence of seasonality on time series analysis
Interpret and communicate results of time series analyses
https://www.statsndata.org/how-to-orderhttps://www.statsndata.org/how-to-order
The Neisseria Gonorrhoeae Medium market plays a crucial role in the fight against one of the most common sexually transmitted infections worldwide, gonorrhea. This specialized medium is designed for the isolation and identification of Neisseria gonorrhoeae, a bacterium responsible for this infection. As healthcare s
In 2023, Morningstar Advisor Workstation was by far the most popular data analytics software worldwide. According to a survey carried out between December 2022 and March 2023, the market share of Morningstar Advisor Workstation was 23.81 percent. It was followed by Riskalyze Elite, with 12.21 percent, and YCharts, with 10.82 percent.