*** TYPE OF SURVEY AND METHODS *** The data set includes responses to a survey conducted by professionally trained interviewers of a social and market research company in the form of computer-aided telephone interviews (CATI) from 2017-02 to 2017-04. The target population was inhabitants of Germany aged 18 years and more, who were randomly selected by using the sampling approaches ADM eASYSAMPLe (based on the Gabler-Häder method) for landline connections and eASYMOBILe for mobile connections. The 1,331 completed questionnaires comprise 44.2 percent mobile and 55.8 percent landline phone respondents. Most questions had options to answer with a 5-point rating scale (Likert-like) anchored with ‘Fully agree’ to ‘Do not agree at all’, or ‘Very uncomfortable’ to ‘Very comfortable’, for instance. Responses by the interviewees were weighted to obtain a representation of the entire German population (variable ‘gewicht’ in the data sets). To this end, standard weighting procedures were applied to reduce differences between the sample and the entire population with regard to known rates of response and non-response depending on household size, age, gender, educational level, and place of residence. *** RELATED PUBLICATION AND FURTHER DETAILS *** The questionnaire, analysis and results will be published in the corresponding report (main text in English language, questionnaire in Appendix B in German language of the interviews and English translation). The report will be available as open access publication at KIT Scientific Publishing (https://www.ksp.kit.edu/). Reference: Orwat, Carsten; Schankin, Andrea (2018): Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey, KIT Scientific Report 7753, Karlsruhe: KIT Scientific Publishing. *** FILE FORMATS *** The data set of responses is saved for the repository KITopen at 2018-11 in the following file formats: comma-separated values (.csv), tapulator-separated values (.dat), Excel (.xlx), Excel 2007 or newer (.xlxs), and SPSS Statistics (.sav). The questionnaire is saved in the following file formats: comma-separated values (.csv), Excel (.xlx), Excel 2007 or newer (.xlxs), and Portable Document Format (.pdf). *** PROJECT AND FUNDING *** The survey is part of the project Assessing Big Data (ABIDA) (from 2015-03 to 2019-02), which receives funding from the Federal Ministry of Education and Research (BMBF), Germany (grant no. 01IS15016A-F). http://www.abida.de *** CONTACT *** Carsten Orwat, Karlsruhe Institute of Technology, Institute for Technology Assessment and Systems Analysis orwat@kit.edu Andrea Schankin, Karlsruhe Institute of Technology, Institute of Telematics andrea.schankin@kit.edu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This training dataset was calculated using the mechanistic modeling approach. See “Big data training data for artificial intelligence-based Li-ion diagnosis and prognosis“ (Journal of Power Sources, Volume 479, 15 December 2020, 228806) and "Analysis of Synthetic Voltage vs. Capacity Datasets for Big Data Diagnosis and Prognosis" (Energies, under review) for more details
The V vs. Q dataset was compiled with a resolution of 0.01 for the triplets and C/25 charges. This accounts for more than 5,000 different paths. Each path was simulated with at most 0.85% increases for each The training dataset, therefore, contains more than 700,000 unique voltage vs. capacity curves.
4 Variables are included, see read me file for details and example how to use. Cell info: Contains information on the setup of the mechanistic model Qnorm: normalize capacity scale for all voltage curves pathinfo: index for simulated conditions for all voltage curves volt: voltage data. Each column corresponds to the voltage simulated under the conditions of the corresponding line in pathinfo.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background: For some time, the VIVO for Weill Cornell Medical College (WCMC) had struggled with both unacceptable page load times and unreliable uptime. With some individual profiles containing upwards of 800 publications, WCMC VIVO has relatively large profiles, but no profile was so large that it could account for this performance. The WCMC VIVO Implementation Team explored a number of options for improving performance including caching, better hardware, query optimization, limiting user access to large pages, using another instance of Tomcat, throttling bots, and blocking IP's issuing too many requests. But none of these avenues were fruitful. Analysis of triple stores: With the 1.7 version, VIVO ships with the Jena SDB triple store, but the SDB version of Jena is no longer supported by its developers. In April, we reviewed various published analyses and benchmarks suggesting there were alternatives to Jena such as Virtuoso that perform better than even Jena's successor, TDB. In particular, the Berlin SPARQL Benchmark v. 3.1[1] showed that Virtuoso had the strongest performance compared to the other data stores measured including BigData, BigOwlim, and Jena TDB. In addition, Virtuoso is used on dbpedia.org which serves up 3 billion triples compared to the only 12 million with WCMC's VIVO site. Whereas Jena SDB stores its triples in a MySQL database, Virtuoso manages its in a binary file. The software is available in open source and commercial editions. Configuration: In late 2014, we installed Virtuoso on a local machine and loaded data from our production VIVO. Some queries completed in about 10% of the time as compared to our production VIVO. However, we noticed that the listview queries invoked whenever profile pages were loaded were still slow. After soliciting feedback from members of both the Virtuoso and VIVO communities, we modified these queries to rely on the OPTIONAL instead of UNION construct. This modification, which wasn't possible in a Jena SDB environment, reduced by eight-fold the number of queries that the application makes of the triple store. About four or five additional steps were required for VIVO and Virtuoso to work optimally with one another; these are documented in the VIVO Duraspace wiki. Results: On March 31, WCMC launched Virtuoso in its production environment. According to our instance of New Relic, VIVO has an average page load of about four seconds and 99% uptime, both of which are dramatic improvements. There are opportunities for further tuning: the four second average includes pages such as the visualizations as well as pages served up to logged in users, which are slower than other types of pages. [1] http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/#comparison
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset created to monitor the Brazilian vegetation combining 4 different systems: (i) an inventory of Brazilian seed plants created to map the country biodiversity; (ii) the Fraction of Absorbed Photosynthetically Active Radiation; (iii) the NASA Power database to include meteorological data; and (iv) the DATASUS system which makes available geographical information from Brazil. The final dataset comprises a large number of attributes including meteorological and vegetational features, which correspond to a total of 8 and 8471 attributes, respectively. Moreover, the dataset contains 20 labels and 865 geographical positions (latitude and longitude) used during the vegetation monitoring. This project makes available raw and preprocessed data, and Machine Learning models (including source codes) adjusted to : i) predict the occurrence of specific type of vegetation in different region without requiring a constant monitoring task; ii) monitor whether or not the prediction accuracy is changing after collecting new data, which provides an important tool to detect how the environment is evolving over time; and iii) study this dataset as an extra data source to better understand and simulate meteorological influences on predicted vegetation types.
More information about the original datasets:
(i) http://www.scielo.br/scielo.php?script=sci_arttext&pid=S2175-78602015000401085 (ii) https://fapar.jrc.ec.europa.eu/Home.php (iii) https://power.larc.nasa.gov/ (iv) http://datasus.saude.gov.br/
The dataset is comprised of text responses from GPT-4 after reading passages of text. It is scored and organized in folders.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
A set of Monte Carlo simulated events, for the evaluation of top quarks' (and their child particles') momentum reconstruction, produced using the HEPData4ML package [1]. Specifically, the entries in this dataset correspond with top quark jets, and the momentum of the jets' constituent particles. This is a newer version of the "Top Quark Momentum Reconstruction Dataset" [2], but with sufficiently large changes to warrant this separate posting.
The dataset is saved in HDF5 format, as sets of arrays with keys (as detailed below). There are ~1.5M events, approximately broken down into the following sets:
Training: 700k events (files with "_train" suffix)
Validation: 200k events (files with "_valid" suffix)
Testing (small): 100k events (files with "_test" suffix)
Testing (large): 500k events (files with "_test_large" suffix)
The two separate types of testing files -- small and large -- are independent from one another, the former for conveniently running quicker testing and the latter for testing with a larger sample.
There are four version of the dataset present, with the versions indicated by the filenames. The different versions correspond with whether or not fast detector simulation was performed (versus truth-level jets), and whether or not the W-boson mass was modified: One version of the dataset uses the nominal value of (m_W = 80.385 \text{ GeV}) as used by Pythia8 [3], whereas another uses a variable mW taking on 101 values evenly-spaced as (m_W \in { 64.308,96.462 } \text{ GeV}). The dataset naming scheme is as follows:
train.h5 : jets clustered from truth-level, nominal mW
train_mW.h5: jets clustered from truth-level, variable mW
train_delphes.h5: jets clustered from Delphes outputs, nominal mW
train_delphes_mW.h5: jets clustered from Delphes outputs, variable mW
Description
13 TeV center-of-mass energy, fully hadronic top quark decays, simulated with Pythia8. ((t \rightarrow W \, b, \; W\rightarrow q \, q'))
Events are generated with leading top quark pT in [550,650] GeV. (set via Pythia8's (\hat{p}_{T,\text{ min}}) and (\hat{p}_{T,\text{ max}}) variables)
No inital- or final-state radiation (ISR/FSR), nor multi-parton interactions (MPI)
Where applicable, detector simulation is done using DELPHES [4], with the ATLAS detector card.
Clustering of particles/objects is done via FastJet [5], using the anti-kT algorithm, with (R=0.8) .
For the truth-level data, inputs to jet clustering are truth-level, final-state particles (i.e. clustering "truth jets").
For the data with detector simulation, the inputs are calorimeter towers from DELPHES.
Tower
objects from DELPHES (not E-flow objects, no tracking information)
Each entry in the dataset corresponds with a single top quark jet, extracted from a (t\bar{t}) event.
All jets are matched to a parton-level top quark within (\Delta R < 0.8) . We choose the jet nearest the parton-level top quark.
Jets are required to have (|\eta| < 2), and (p_{T} > 15 \text{ GeV}).
The 200 leading (highest-pT) jet constituent four-momenta are stored in Cartesian coordinates (E,px,py,pz), sorted by decreasing pT, with zero-padding.
The jet four-momentum is stored in Cartesian coordinates (E, px, py, pz), as well as in cylindrical coordinates ((p_T,\eta,\phi,m)).
The truth (parton-level) four-momenta of the top quark, the bottom quark the W-boson, and the quarks to which the W-boson decays, are stored in Cartesian coordinates.
In addition, the momenta of the 120 leading stable daughter particles of the W-boson are stored in Cartesian coordinates.
Description of data fields & metadataBelow is a brief description of the various fields in the dataset. The dataset also contains metadata fields, stored using HDF5's "attributes". This is used for fields that are common across many events, and stores information such as generator-level configurations (in principle, all the information is stored as to be able to recreate the dataset with the HEPData4ML tool).
Note that fields whose keys have the prefix "jh_" correspond with output from the Johns Hopkins top tagger [6], as implemented in FastJet.
Also note that for the keys corresponding with four-momenta in Cartesian coordinates, there are rotated versions of these fields -- the data has been rotated so that the W-boson is at ((\theta=0, \phi=0)), and the b-quark is in the ((\theta=0, \phi < 0)) plane. This rotation is potentially useful for visualizations of the events.
Nobj: The number of constituents in the jet.
Pmu: The four-momenta of the jet constituents, in (E, px, py, pz). Sorted by decreasing pT and zero-padded to a length of 200.
Pmu_rot: Rotated version.
contained_daughter_sum_Pmu: Four-momentum sum of the stable daughter particles of the W-boson that fall within (\Delta R < 0.8) of the jet centroid.
contained_daughter_sum_Pmu_rot: Rotated version.
cross_section: Cross-section for the corresponding process, reported by Pythia8.
cross_section_uncertainty: Cross-section uncertainty for the corresponding process, reported by Pythia8.
energy_ratio smeared: Ratio of the true energy of W-boson daughter particles contributing to this calorimeter tower, divided by the total smeared energy in this calorimeter tower.
Only relevant for the DELPHES datasets.
energy_ratio_truth: Ratio of the true energy of W-boson daughter particles contributing to this calorimeter tower, divided by the total true energy of particles contributing to this calorimeter tower.
The above definition is relevant only for the DELPHES datasets. For the truth-level datasets, this field is repurposed to store a value (0 or 1) indicating whether or not the given particle (whose momentum is in the Pmu
field) is a W-boson daughter.
event_idx: Redundant -- used for event indexing during the event generation process.
is_signal: Redundant -- indicates whether an event is signal or background, but this is a fully signal dataset. Potentially useful if combining with other datasets produced with HEPData4ML.
jet_Pmu: Four-momentum of the jet, in (E, px, py, pz).
jet_Pmu_rot: Rotated version.
jet_Pmu_cyl: Four-momentum of the jet, in ((pT_,\eta,\phi,m)).
jet_bqq_contained_dR06: Boolean flag indicating whether or not the truth-level b and the two quarks from W decay are contained within (\Delta R < 0.6) of the jet centroid.
jet_bqq_contained_dR08: Boolean flag indicating whether or not the truth-level b and the two quarks from W decay are contained within (\Delta R < 0.8) of the jet centroid.
jet_bqq_dr_max: Maximum of (\big\lbrace \Delta R \left( \text{jet},b \right), \; \Delta R \left( \text{jet},q \right), \; \Delta R \left( \text{jet},q' \right) \big\rbrace).
jet_qq_contained_dR06: Boolean flag indicating whether or not the two quarks from W decay are contained within (\Delta R < 0.6) of the jet centroid.
jet_qq_contained_dR08: Boolean flag indicating whether or not the two quarks from W decay are contained within (\Delta R < 0.8) of the jet centroid.
jet_qq_dr_max: Maximum of (\big\lbrace \Delta R \left( \text{jet},q \right), \; \Delta R \left( \text{jet},q' \right) \big\rbrace).
jet_top_daughters_contained_dR08: Boolean flag indicating whether the final-state daughters of the top quark are within (\Delta R < 0.8) of the jet centroid. Specifically, the algorithm for this flag checks that the jet contains the stable daughters of both the b quark and the W boson. For the b and W each, daughter particles are allowed to be uncontained as long as (for each particle) the (p_T) of the sum of uncontained daughters is below (2.5 \text{ GeV}).
jh_W_Nobj: Number of constituents in the W-boson candidate identified by the JH tagger.
jh_W_Pmu: Four-momentum of the JH tagger W-boson candidate, in (E, px, py, pz).
jh_W_Pmu_rot: Rotated version.
jh_W_constituent_Pmu: Four-momentum of the constituents of the JH tagger W-boson candidate, in (E, px, py, pz).
jh_W_constituent_Pmu_rot: Rotated version.
jh_m: Mass of the JH W-boson candidate.
jh_m_resolution: Ratio of JH W-boson candidate mass, versus the true W-boson mass.
jh_pt: (p_T) of the JH W-boson candidate.
jh_pt_resolution: Ratio of JH W-boson candidate (p_T), versus the true W-boson mass.
jh_tag: Whether or not a jet was tagged by the JH tagger.
mc_weight: Monte Carlo weight for this event, reported by Pythia8.
process_code: Process code reported by Pythia8.
rotation_matrix: Rotation matrix for rotating the events' 3-momenta as to produce the rotated copies stored in the dataset.
truth_Nobj: Number of truth-level particles (saved in truth_Pmu).
truth_Pdg: PDG codes of the truth-level particles.
truth_Pmu: Truth-level particles: The top quark, bottom quark, W boson, q, q', and 120 leading, stable W-boson daughter particles, in (E, px, py, pz). A few of these are also stored in separate keys:
truth_Pmu_0: Top quark.
truth_Pmu_0_rot: Rotated version.
truth_Pmu_1: Bottom quark.
truth_Pmu_1_rot: Rotated version.
truth_Pmu_2: W-boson.
truth_Pmu_2_rot: Rotated version.
truth_Pmu_3: q from W decay.
truth_Pmu_3_rot: Rotated version.
truth_Pmu_4: q' from W decay.
truth_Pmu_4_rot: Rotated version.
truth_Pmu_0_rot: Rotated version of truth_Pmu
.
The following fields correspond with metadata -- they provide the index of the corresponding metadata entry for each event:
command_line_arguments: The command-line arguments passed to HEPData4ML's run.py
script.
config_file: The contents of the Python configuration file used for HEPData4ML. This, together with the command-line arguments, defines how the tool was run, what processes, jet clustering and post-processing was done, etc.
git_hash: Git hash for HEPData4ML.
timestamp: Timestamp for when the dataset was created
[NOTE - 2022-09-07: this dataset is superseded by an updated version https://doi.org/10.15482/USDA.ADC/1526433 ] This dataset consists of weather data for each year when maize was grown for grain at the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL). Maize was grown for grain on four large, precision weighing lysimeters, each in the center of a 4.44 ha square field. The four square fields are themselves arranged in a larger square with the fields in four adjacent quadrants of the larger square. Fields and lysimeters within each field are thus designated northeast (NE), southeast (SE), northwest (NW), and southwest (SW). Irrigation was by linear move sprinkler system in 1989, 1990, and 1994. In 2013, 2016, and 2018, two lysimeters and their respective fields (NE and SE) were irrigated using subsurface drip irrigation (SDI), and two lysimeters and their respective fields (NW and SW) were irrigated by a linear move sprinkler system. Irrigations were managed to replenish soil water used by the crop on a weekly or more frequent basis as determined by soil profile water content readings made with a neutron probe to 2.4-m depth in the field. The weather data include solar irradiance, barometric pressure, air temperature and relative humidity, and wind speed determined using sensors placed at 2-m height over a level, grass surface mowed to not exceed 12 cm height and irrigated and fertilized to maintain reference conditions as promulgated by ASCE (2005) and FAO (1996). Irrigation was by surface flood in 1989 through 1994, and by subsurface drip irrigation after 1994. Sensors were replicated and intercompared between replicates and with data from nearby weather stations, which were sometimes used for gap filling. Quality control and assurance methods are described by Evett et al. (2018). These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. Prior publications have focused on maize ET, crop coefficients, and crop water productivity. Crop coefficients have been used by ET networks. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by the Agricultural Model Intercomparison and Improvement Project (AgMIP), by OPENET, and by many others for testing, and calibrating models of ET that use satellite and/or weather data. Resources in this dataset: Resource Title: 1989 Bushland, TX, standard 15-minute weather data. File Name: 1989_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: The weather data are presented as 15-minute mean values of solar irradiance, air temperature, relative humidity, wind speed, and barometric pressure; and as 15-minute totals of precipitation (rain and snow). Daily total precipitation as determined by mass balance at each of the four large, precision weighing lysimeters is given in a separate tab along with the mean daily value of precipitation. Data dictionaries are in separate tabs with names corresponding to those of tabs containing data. A separate tab contains a visualization tool for missing data. Another tab contains a visualization tool for the weather data in five-day increments of the 15-minute data. An Introduction tab explains the other tabs, lists the authors, explains data time conventions, explains symbols, lists the sensors, and datalogging systems used, and gives geographic coordinates of sensing locations. Resource Title: 1990 Bushland, TX, standard 15-minute weather data. File Name: 1990_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1990. Resource Title: 1994 Bushland, TX, standard 15-minute weather data. File Name: 1994_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1994. Resource Title: 2013 Bushland, TX, standard 15-minute weather data. File Name: 2013_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 2013. Resource Title: 2016 Bushland, TX, standard 15-minute weather data. File Name: 2016_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 2016. Resource Title: 2018 Bushland, TX, standard 15-minute weather data. File Name: 2018_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 2018. Resource Title: 1996 Bushland, TX, standard 15-minute weather data. File Name: 1996_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1996. Resource Title: 1997 Bushland, TX, standard 15-minute weather data. File Name: 1997_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1997. Resource Title: 1998 Bushland, TX, standard 15-minute weather data. File Name: 1998_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1998. Resource Title: 1999 Bushland, TX, standard 15-minute weather data. File Name: 1999_15-min_weather_SWMRU_CPRL.xlsx. Resource Description: As above for 1999.
The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. Additionally, note that no streamflow or other data assimilation is performed within any of the NWM retrospective simulations
One application of this dataset is to provide historical context to current near real-time streamflow, soil moisture and snowpack conditions. The retrospective data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and 3-hourly land surface output. This dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes.
Details for Each Version of the NWM Retrospective Output
CONUS Domain - CONUS retrospective output is provided by all four versions of the NWM
Competitive intelligence monitoring goes beyond your sales team. Our CI solutions also bring powerful insights to your production, logistics, operation & marketing departments.
Why should you use our Competitive intelligence data? 1. Increase visibility: Our geolocation approach allows us to “get inside” any facility in the US, providing visibility in places where other solutions do not reach. 2. In-depth 360º analysis: Perform a unique and in-depth analysis of competitors, suppliers and customers. 3. Powerful Insights: We use alternative data and big data methodologies to peel back the layers of any private or public company. 4. Uncover your blind spots against leading competitors: Understand the complete business environment of your competitors, from third-tier suppliers to main investors. 5. Identify business opportunities: Analyze your competitor's strategic shifts and identify unnoticed business opportunities and possible threats or disruptions. 6. Keep track of your competitor´s influence around any specific area: Maintain constant monitoring of your competitors' actions and their impact on specific market areas.
How other companies are using our CI Solution? 1. Enriched Data Intelligence: Our Market Intelligence data bring you key insights from different angles. 2. Due Diligence: Our data provide the required panorama to evaluate a company’s cross-company relations to decide whether or not to proceed with an acquisition. 3. Risk Assessment: Our CI approach allows you to anticipate potential disruptions by understanding behavior in all the supply chain tiers. 4. Supply Chain Analysis: Our advanced Geolocation approach allows you to visualize and map an entire supply chain network. 5. Insights Discovery: Our relationship identifiers algorithms generate data matrix networks that uncover new and unnoticed insights within a specific market, consumer segment, competitors' influence, logistics shifts, and more.
From "digital" to the real field: Most competitive intelligence companies focus their solutions analysis on social shares, review sites, and sales calls. Our competitive intelligence strategy consists on tracking the real behavior of your market on the field, so that you can answer questions like: -What uncovered need does my market have? -How much of a threat is my competition? -How is the market responding to my competitor´s offer? -How my competitors are changing? -Am I losing or winning market?
https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
Working with ecological data often involves ethical considerations, particularly when data are applied to address societal needs. However, data science ethics are rarely included as part of undergraduate and graduate training programs. Here, we present four modules for teaching ethics in data science, with real-world case studies related to ecological forecasting. See the module topics in the description below.
The material was originally published in Teaching Issues and Experiments in Ecology (TIEE), Vol 19, Practice #13. Dec 2023. https://tiee.esa.org/vol/v19/issues/case_studies/lewis/abstract.html
In addition to having this material in TIEE, we are sharing these resources in QUBES to expand the scope of potential users. TIEE approved us sharing the material in both places. Note that TIEE does not provide DOIs for the published materials, but QUBES does.
We include both pdfs and word documents for the essay assignments, class handouts, and pre-reading materials. The Full Article Text provides the student-active approaches and cognitive skills, the case studies, the background materials, and instructions for the instructors and the students for each of the four modules.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 2.77(USD Billion) |
MARKET SIZE 2024 | 2.9(USD Billion) |
MARKET SIZE 2032 | 4.2(USD Billion) |
SEGMENTS COVERED | Frequency Range ,Number of Channels ,Sample Rate ,Application ,Form Factor ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | Increasing demand for highspeed data analysis Growing adoption of artificial intelligence AI and machine learning ML Technological advancements in data mining algorithms Rise of big data and the Internet of Things IoT Growing need for realtime data insights |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | Rigol Technologies ,Teledyne LeCroy ,Keysight Technologies ,Rohde & Schwarz ,Yokogawa Electric Corporation ,Siglent Technologies ,Agilent Technologies ,Anritsu ,Tektronix ,Hioki ,Megger ,Seaward ,Fluke Corporation ,Extech Instruments ,Chauvin Arnoux |
MARKET FORECAST PERIOD | 2024 - 2032 |
KEY MARKET OPPORTUNITIES | Realtime data analysis Predictive analytics Automated decisionmaking Fraud detection Cybersecurity |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 4.74% (2024 - 2032) |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
[Abstract] The author has been in a suboptimal state of health for an extensive period of time and had experienced difficulties in work and life. The goal was to identify a simple method for implementing self-treatment. The author used a method referred to as ‘inward vision’ to observe, perceive, feel, and record her own physical condition. A set of simple and convenient self-diagnostic methods to determine the location and characteristic of a disease was summarised through consideration of these ‘records of perceived observation’. A series of oral capsules constituting traditional Chinese remedies was developed based on these records and findings from the extant literature. The set of Chinese remedies is extraordinary and unusual, comprising dozens of species of traditional Chinese herbal remedies. There are a considerable number of different combinations of these herbs used for the treatment of numerous diseases and symptoms. It is safe and easy to consume. Over the years, the author has self-treated various symptoms and improved her health. If a large proportion of people are able to self-treat their illnesses, the world would be improved. It would be a step forward in the ability of humans to fight diseases. The diagnostic and therapeutic method described herein is simple, convenient, efficient, and cost-effective. This could be efficacious in solving socio-economic problems such as lack of access to effective medical treatment. It can be used to conduct research on several aspects, including food, plants, animals, and medical equipment. Furthermore, ‘inward vision’ as a method of thinking may be helpful to researchers in other disciplines.
The author Xiuli Yang is the founder of Yang’s six-position treatment method and the developer of the Zhenshanliyang series Chinese remedies.
This article is a supplement to another article.
The Chinese version of this article was published in the magazine titled, The China Health Care and Nutrition in January 2013 (pp. 457-458). This magazine is published between the 21st and 30th of each month. The four pictures here are only related to the Chinese version.
The Chinese version of this article, links are below.
The California Department of Forestry and Fire Protection's Fire and Resource Assessment Program (FRAP) annually maintains and distributes an historical wildland fire perimeter dataset from across public and private lands in California. The GIS data is developed with the cooperation of the United States Forest Service Region 5, the Bureau of Land Management, California State Parks, National Park Service and the United States Fish and Wildlife Service and is released in the spring with added data from the previous calendar year. Although the dataset represents the most complete digital record of fire perimeters in California, it is still incomplete, and users should be cautious when drawing conclusions based on the data. This data should be used carefully for statistical analysis and reporting due to missing perimeters (see Use Limitation in metadata). Some fires are missing because historical records were lost or damaged, were too small for the minimum cutoffs, had inadequate documentation or have not yet been incorporated into the database. Other errors with the fire perimeter database include duplicate fires and over-generalization. Additionally, over-generalization, particularly with large old fires, may show unburned "islands" within the final perimeter as burned. Users of the fire perimeter database must exercise caution in application of the data. Careful use of the fire perimeter database will prevent users from drawing inaccurate or erroneous conclusions from the data. This data is updated annually in the spring with fire perimeters from the previous fire season. This dataset may differ in California compared to that available from the National Interagency Fire Center (NIFC) due to different requirements between the two datasets. The data covers fires back to 1878. As of May 2024, it represents fire23_1. Please help improve this dataset by filling out this survey with feedback:Historic Fire Perimeter Dataset Feedback (arcgis.com)Current criteria for data collection are as follows:CAL FIRE (including contract counties) submit perimeters ≥10 acres in timber, ≥50 acres in brush, or ≥300 acres in grass, and/or ≥3 impacted residential or commercial structures, and/or caused ≥1 fatality.All cooperating agencies submit perimeters ≥10 acres.Version update:Firep23_1 was released in May 2024. Two hundred eighty four fires from the 2023 fire season were added to the database (21 from BLM, 102 from CAL FIRE, 72 from Contract Counties, 19 from LRA, 9 from NPS, 57 from USFS and 4 from USFW). The 2020 Cottonwood fire, 2021 Lone Rock and Union fires, as well as the 2022 Lost Lake fire were added. USFW submitted a higher accuracy perimeter to replace the 2022 River perimeter. Additionally, 48 perimeters were digitized from an historical map included in a publication from Weeks, d. et al. The Utilization of El Dorado County Land. May 1934, Bulletin 572. University of California, Berkeley. Two thousand eighteen perimeters had attributes updated, the bulk of which had IRWIN IDs added. A duplicate 2020 Erbes perimeter was removed. The following fires were identified as meeting our collection criteria, but are not included in this version and will hopefully be added in the next update: Big Hill #2 (2023-CAHIA-001020). YEAR_ field changed to a short integer type. San Diego CAL FIRE UNIT_ID changed to SDU (the former code MVU is maintained in the UNIT_ID domains). COMPLEX_INCNUM renamed to COMPLEX_ID and is in process of transitioning from local incident number to the complex IRWIN ID. Perimeters managed in a complex in 2023 are added with the complex IRWIN ID. Those previously added will transition to complex IRWIN IDs in a future update.Includes separate layers filtered by criteria as follows:California Fire Perimeters (All): Unfiltered. The entire collection of wildfire perimeters in the database. It is scale dependent and starts displaying at the country level scale. Recent Large Fire Perimeters (≥5000 acres): Filtered for wildfires greater or equal to 5,000 acres for the last 5 years of fires (2019-2023), symbolized with color by year and is scale dependent and starts displaying at the country level scale. Year-only labels for recent large fires.California Fire Perimeters (1950+): Filtered for wildfires that started in 1950-present. Symbolized by decade, and display starting at country level scale.Detailed metadata is included in the following documents:Wildland Fire Perimeters (Firep23_1) Metadata For any questions, please contact the data steward:Kim Wallin, GIS SpecialistCAL FIRE, Fire & Resource Assessment Program (FRAP)kimberly.wallin@fire.ca.gov
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectiveTo provide a practical guidance for the analysis of N-of-1 trials by comparing four commonly used models.MethodsThe four models, paired t-test, mixed effects model of difference, mixed effects model and meta-analysis of summary data were compared using a simulation study. The assumed 3-cycles and 4-cycles N-of-1 trials were set with sample sizes of 1, 3, 5, 10, 20 and 30 respectively under normally distributed assumption. The data were generated based on variance-covariance matrix under the assumption of (i) compound symmetry structure or first-order autoregressive structure, and (ii) no carryover effect or 20% carryover effect. Type I error, power, bias (mean error), and mean square error (MSE) of effect differences between two groups were used to evaluate the performance of the four models.ResultsThe results from the 3-cycles and 4-cycles N-of-1 trials were comparable with respect to type I error, power, bias and MSE. Paired t-test yielded type I error near to the nominal level, higher power, comparable bias and small MSE, whether there was carryover effect or not. Compared with paired t-test, mixed effects model produced similar size of type I error, smaller bias, but lower power and bigger MSE. Mixed effects model of difference and meta-analysis of summary data yielded type I error far from the nominal level, low power, and large bias and MSE irrespective of the presence or absence of carryover effect.ConclusionWe recommended paired t-test to be used for normally distributed data of N-of-1 trials because of its optimal statistical performance. In the presence of carryover effects, mixed effects model could be used as an alternative.
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN.
The Department of Statistics (DOS) carried out four rounds of the 2007 Employment and Unemployment Survey (EUS) during February, May, August and November 2007. The survey rounds covered a total sample of about fifty three thousand households Nation-wide. The sampled households were selected using a stratified multi-stage cluster sampling design. It is noteworthy that the sample represents the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
The importance of this survey lies in that it provides a comprehensive data base on employment and unemployment that serves decision makers, researchers as well as other parties concerned with policies related to the organization of the Jordanian labor market.
The raw survey data provided by the Statistical Agency were cleaned and harmonized by the Economic Research Forum, in the context of a major project that started in 2009. During which extensive efforts have been exerted to acquire, clean, harmonize, preserve and disseminate micro data of existing labor force surveys in several Arab countries.
Covering a sample representative on the national level (Kingdom), governorates, the three Regions (Central, North and South), and the urban/rural areas.
1- Household/family. 2- Individual/person.
The survey covered a national sample of households and all individuals permanently residing in surveyed households.
Sample survey data [ssd]
THE CLEANED AND HARMONIZED VERSION OF THE SURVEY DATA PRODUCED AND PUBLISHED BY THE ECONOMIC RESEARCH FORUM REPRESENTS 100% OF THE ORIGINAL SURVEY DATA COLLECTED BY THE DEPARTMENT OF STATISTICS OF THE HASHEMITE KINGDOM OF JORDAN
The sample of this survey is based on the frame provided by the data of the Population and Housing Census, 2004. The Kingdom was divided into strata, where each city with a population of 100,000 persons or more was considered as a large city. The total number of these cities is 6. Each governorate (except for the 6 large cities) was divided into rural and urban areas. The rest of the urban areas in each governorate was considered as an independent stratum. The same was applied to rural areas where it was considered as an independent stratum. The total number of strata was 30.
In view of the existing significant variation in the socio-economic characteristics in large cities in particular and in urban in general, each stratum of the large cities and urban strata was divided into four sub-stratum according to the socio- economic characteristics provided by the population and housing census with the purpose of providing homogeneous strata.
The frame excludes collective dwellings, However, it is worth noting that the collective households identified in the harmonized data, through a variable indicating the household type, are those reported without heads in the raw data, and in which the relationship of all household members to head was reported "other".
This sample is also not representative for the non-Jordanian population.
The sample of this survey was designed, using the two-stage cluster stratified sampling method, based on the data of the population and housing census 2004 for carrying out household surveys. The sample is representative on the Kingdom, rural-urban regions and governorates levels. The total sample size for each round was 1336 Primary Sampling Units (PSUs) (clusters). These units were distributed to urban and rural regions in the governorates, in addition to the large cities in each governorate according to the weight of persons and households, and according to the variance within each stratum. Slight modifications regarding the number of these units were made to cope with the multiple of 8, the number of clusters for four rounds was 5344.
The main sample consists of 40 replicates, each replicate consists of 167 PSUs. For the purpose of each round, eight replicates of the main sample were used. The PSUs were ordered within each stratum according to geographic characteristics and then according to socio-economic characteristics in order to ensure good spread of the sample. Then, the sample was selected on two stages. In the first stage, the PSUs were selected using the Probability Proportionate to Size with systematic selection procedure. The number of households in each PSU served as its weight or size. In the second stage, the blocks of the PSUs (cluster) which were selected in the first stage have been updated. Then a constant number of households (10 households) was selected, using the random systematic sampling method as final PSUs from each PSU (cluster).
It is noteworthy that the sample of the present survey does not represent the non-Jordanian population, due to the fact that it is based on households living in conventional dwellings. In other words, it does not cover the collective households living in collective dwellings. Therefore, the non-Jordanian households covered in the present survey are either private households or collective households living in conventional dwellings.
Face-to-face [f2f]
The plan of the tabulation of survey results was guided by former Employment and Unemployment Surveys which were previously prepared and tested. The final survey report was then prepared to include all detailed tabulations as well as the methodology of the survey.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We used this dataset to assess the strength of isolation due to geographic and macroclimatic distance across island and mainland systems, comparing published measurements of phenotypic traits and neutral genetic diversity for populations of plants and animals worldwide. The dataset includes 112 studies of 108 species (72 animals and 36 plants) in 868 island populations and 760 mainland populations, with population-level taxonomic and biogeographic information, totalling 7438 records. Methods Description of methods used for collection/generation of data: We searched the ISI Web of Science in March 2017 for comparative studies that included data on phenotypic traits and/or neutral genetic diversity of populations on true islands and on mainland sites in any taxonomic group. Search terms were 'island' and ('mainland' or 'continental') and 'population*' and ('demograph*' or 'fitness' or 'survival' or 'growth' or 'reproduc*' or 'density' or 'abundance' or 'size' or 'genetic diversity' or 'genetic structure' or 'population genetics') and ('plant*' or 'tree*' or 'shrub*or 'animal*' or 'bird*' or 'amphibian*' or 'mammal*' or 'reptile*' or 'lizard*' or 'snake*' or 'fish'), subsequently refined to the Web of Science categories 'Ecology' or 'Evolutionary Biology' or 'Zoology' or 'Genetics Heredity' or 'Biodiversity Conservation' or 'Marine Freshwater Biology' or 'Plant Sciences' or 'Geography Physical' or 'Ornithology' or 'Biochemistry Molecular Biology' or 'Multidisciplinary Sciences' or 'Environmental Sciences' or 'Fisheries' or 'Oceanography' or 'Biology' or 'Forestry' or 'Reproductive Biology' or 'Behavioral Sciences'. The search included the whole text including abstract and title, but only abstracts and titles were searchable for older papers depending on the journal. The search returned 1237 papers which were distributed among coauthors for further scrutiny. First paper filter To be useful, the papers must have met the following criteria: Overall study design criteria: Include at least two separate islands and two mainland populations; Eliminate studies comparing populations on several islands where there were no clear mainland vs. island comparisons; Present primary research data (e.g., meta-analyses were discarded); Include a field study (e.g., experimental studies and ex situ populations were discarded); Can include data from sub-populations pooled within an island or within a mainland population (but not between islands or between mainland sites); Island criteria: Island populations situated on separate islands (papers where all information on island populations originated from a single island were discarded); Can include multiple populations recorded on the same island, if there is more than one island in the study; While we accepted the authors' judgement about island vs. mainland status, in 19 papers we made our own judgement based on the relative size of the island or position relative to the mainland (e.g. Honshu Island of Japan, sized 227 960 km² was interpreted as mainland relative to islands less than 91 km²); Include islands surrounded by sea water but not islands in a lake or big river; Include islands regardless of origin (continental shelf, volcanic); Taxonomic criteria: Include any taxonomic group; The paper must compare populations within a single species; Do not include marine species (including coastline organisms); Databases used to check species delimitation: Handbook of Birds of the World (www.hbw.com/); International Plant Names Index (https://www.ipni.org/); Plants of the World Online(https://powo.science.kew.org/); Handbook of the Mammals of the World; Global Biodiversity Information Facility (https://www.gbif.org/); Biogeographic criteria: Include all continents, as well as studies on multiple continents; Do not include papers regarding migratory species; Only include old / historical invasions to islands (>50 yrs); do not include recent invasions; Response criteria: Do not include studies which report community-level responses such as species richness; Include genetic diversity measures and/or individual and population-level phenotypic trait responses; The first paper filter resulted in 235 papers which were randomly reassigned for a second round of filtering. Second paper filter In the second filter, we excluded papers that did not provide population geographic coordinates and population-level quantitative data, unless data were provided upon contacting the authors or could be obtained from figures using DataThief (Tummers 2006). We visually inspected maps plotted for each study separately and we made minor adjustments to the GPS coordinates when the coordinates placed the focal population off the island or mainland. For this study, we included only responses measured at the individual level, therefore we removed papers referring to demographic performance and traits such as immunity, behaviour and diet that are heavily reliant on ecosystem context. We extracted data on population-level mean for two broad categories of response: i) broad phenotypic measures, which included traits (size, weight and morphology of entire body or body parts), metabolism products, physiology, vital rates (growth, survival, reproduction) and mean age of sampled mature individuals; and ii) genetic diversity, which included heterozygosity,allelic richness, number of alleles per locus etc. The final dataset includes 112 studies and 108 species. Methods for processing the data: We made minor adjustments to the GPS location of some populations upon visual inspection on Google Maps of the correct overlay of the data point with the indicated island body or mainland. For each population we extracted four climate variables reflecting mean and variation in temperature and precipitation available in CliMond V1.2 (Kritikos et al. 2012) at 10 minutes resolution: mean annual temperature (Bio1), annual precipitation (Bio12), temperature seasonality (CV) (Bio4) and precipitation seasonality (CV) (Bio15) using the "prcomp function" in the stats package in R. For populations where climate variables were not available on the global climate maps mostly due to small island size not captured in CliMond, we extracted data from the geographically closest grid cell with available climate values, which was available within 3.5 km away from the focal grid cell for all localities. We normalised the four climate variables using the "normalizer" package in R (Vilela 2020), and we performed a Principal Component Analysis (PCA) using the psych package in R (Revelle 2018). We saved the loadings of the axes for further analyses. References:
Bruno Vilela (2020). normalizer: Making data normal again.. R package version 0.1.0. Kriticos, D.J., Webber, B.L., Leriche, A., Ota, N., Macadam, I., Bathols, J., et al.(2012). CliMond: global high-resolution historical and future scenario climate surfaces for bioclimatic modelling. Methods Ecol. Evol., 3, 53--64. Revelle, W. (2018) psych: Procedures for Personality and Psychological Research, Northwestern University, Evanston, Illinois, USA, https://CRAN.R-project.org/package=psych Version = 1.8.12. Tummers, B. (2006). DataThief III. https://datathief.org/
Not seeing a result you expected?
Learn how you can add new datasets to our index.
*** TYPE OF SURVEY AND METHODS *** The data set includes responses to a survey conducted by professionally trained interviewers of a social and market research company in the form of computer-aided telephone interviews (CATI) from 2017-02 to 2017-04. The target population was inhabitants of Germany aged 18 years and more, who were randomly selected by using the sampling approaches ADM eASYSAMPLe (based on the Gabler-Häder method) for landline connections and eASYMOBILe for mobile connections. The 1,331 completed questionnaires comprise 44.2 percent mobile and 55.8 percent landline phone respondents. Most questions had options to answer with a 5-point rating scale (Likert-like) anchored with ‘Fully agree’ to ‘Do not agree at all’, or ‘Very uncomfortable’ to ‘Very comfortable’, for instance. Responses by the interviewees were weighted to obtain a representation of the entire German population (variable ‘gewicht’ in the data sets). To this end, standard weighting procedures were applied to reduce differences between the sample and the entire population with regard to known rates of response and non-response depending on household size, age, gender, educational level, and place of residence. *** RELATED PUBLICATION AND FURTHER DETAILS *** The questionnaire, analysis and results will be published in the corresponding report (main text in English language, questionnaire in Appendix B in German language of the interviews and English translation). The report will be available as open access publication at KIT Scientific Publishing (https://www.ksp.kit.edu/). Reference: Orwat, Carsten; Schankin, Andrea (2018): Attitudes towards big data practices and the institutional framework of privacy and data protection - A population survey, KIT Scientific Report 7753, Karlsruhe: KIT Scientific Publishing. *** FILE FORMATS *** The data set of responses is saved for the repository KITopen at 2018-11 in the following file formats: comma-separated values (.csv), tapulator-separated values (.dat), Excel (.xlx), Excel 2007 or newer (.xlxs), and SPSS Statistics (.sav). The questionnaire is saved in the following file formats: comma-separated values (.csv), Excel (.xlx), Excel 2007 or newer (.xlxs), and Portable Document Format (.pdf). *** PROJECT AND FUNDING *** The survey is part of the project Assessing Big Data (ABIDA) (from 2015-03 to 2019-02), which receives funding from the Federal Ministry of Education and Research (BMBF), Germany (grant no. 01IS15016A-F). http://www.abida.de *** CONTACT *** Carsten Orwat, Karlsruhe Institute of Technology, Institute for Technology Assessment and Systems Analysis orwat@kit.edu Andrea Schankin, Karlsruhe Institute of Technology, Institute of Telematics andrea.schankin@kit.edu