Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.
Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations.
The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Sunday to Saturday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities.
The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities.
For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-15 means the average/sum/coverage of the elements captured from that given facility starting and including Sunday, November 15, 2020, and ending and including reports for Saturday, November 21, 2020.
Reported elements include an append of either “_coverage”, “_sum”, or “_avg”.
The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”.
A story page was created to display both corrected and raw datasets and can be accessed at this link: https://healthdata.gov/stories/s/nhgk-5gpv
This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020.
Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect.
For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied.
For recent updates to the dataset, scroll to the bottom of the dataset description.
On May 3, 2021, the following fields have been added to this data set.
On May 8, 2021, this data set has been converted to a corrected data set. The corrections applied to this data set are to smooth out data anomalies caused by keyed in data errors. To help determine which records have had corrections made to it. An additional Boolean field called is_corrected has been added.
On May 13, 2021 Changed vaccination fields from sum to max or min fields. This reflects the maximum or minimum number reported for that metric in a given week.
On June 7, 2021 Changed vaccination fields from max or min fields to Wednesday reported only. This reflects that the number reported for that metric is only reported on Wednesdays in a given week.
On September 20, 2021, the following has been updated: The use of analytic dataset as a source.
On January 19, 2022, the following fields have been added to this dataset:
On April 28, 2022, the following pediatric fields have been added to this dataset:
On October 24, 2022, the data includes more analytical calculations in efforts to provide a cleaner dataset. For a raw version of this dataset, please follow this link: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/uqq2-txqb
Due to changes in reporting requirements, after June 19, 2023, a collection week is defined as starting on a Sunday and ending on the next Saturday.
The "COVID-19 Reported Patient Impact and Hospital Capacity by Facility" dataset from the U.S. Department of Health & Human Services, filtered for Connecticut. View the full dataset and detailed metadata here: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Friday to Thursday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities. The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities. For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-20 means the average/sum/coverage of the elements captured from that given facility starting and including Friday, November 20, 2020, and ending and including reports for Thursday, November 26, 2020. Reported elements include an append of either “_coverage”, “_sum”, or “_avg”. A “_coverage” append denotes how many times the facility reported that element during that collection week. A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week. A “_avg” append is the average of the reports provided for that facility for that element during that collection week. The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”. This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020. Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect. For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied. On May 3, 2021, the following fields have been added to this data set. hhs_ids previous_day_admission_adult_covid_confirmed_7_day_coverage previous_day_admission_pediatric_covid_confirmed_7_day_coverage previous_day_admission_adult_covid_suspected_7_day_coverage previous_day_admission_pediatric_covid_suspected_7_day_coverage previous_week_personnel_covid_vaccinated_doses_administered_7_day_sum total_personnel_covid_vaccinated_doses_none_7_day_sum total_personnel_covid_vaccinated_doses_one_7_day_sum total_personnel_covid_vaccinated_doses_all_7_day_sum previous_week_patients_covid_vaccinated_doses_one_7_day_sum previous_week_patients_covid_vaccinated_doses_all_7_day_sum On May 8, 2021, this data set has been converted to a corrected data set. The corrections applied to this data set are to smooth out data anomalies caused by keyed in data errors. To help determine which records have had corrections made to it. An additional Boolean field called is_corrected has been added. To see the numbers as reported by the facilities, go to: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/uqq2-txqb On May 13, 2021 Changed vaccination fields from sum to max or min fields. This reflects the maximum or minimum number report
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations. The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Sunday to Saturday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities. The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities. For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-15 means the average/sum/coverage of the elements captured from that given facility starting and including Sunday, November 15, 2020, and ending and including reports for Saturday, November 21, 2020. Reported elements include an append of either “_coverage”, “_sum”, or “_avg”. A “_coverage” append denotes how many times the facility reported that element during that collection week. A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week. A “_avg” append is the average of the reports provided for that facility for that element during that collection week. The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”. A story page was created to display both corrected and raw datasets and can be accessed at this link: https://healthdata.gov/stories/s/nhgk-5gpv This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020. Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect. For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied. For recent updates to the dataset, scroll to the bottom of the dataset description. On May 3, 2021, the following fields have been added to this data set. hhs_ids previous_day_admission_adult_covid_confirmed_7_day_coverage previous_day_admission_pediatric_covid_confirmed_7_day_coverage previous_day_admission_adult_covid_suspected_7_day_coverage previous_day_admission_pediatric_covid_suspected_7_day_coverage previous_week_personnel_covid_vaccinated_doses_administered_7_day_sum total_personnel_covid_vaccinated_doses_none_7_day_sum total_personnel_covid_vaccinated_doses_one_7_day_sum total_personnel_covid_vaccinated_doses_all_7_day_sum previous_week_patients_covid_vaccinated_doses_one_7_day_sum previous_week_patients_covid_vaccinated_doses_all_
Suppl. Append. 1DNA sequence data from GenBank.SupplApp1.csvSuppl. Append. 2Scored character states and literature sources.SupplApp2.csvSuppl. Append. 3Biogeographic models and model fit statistics. Results for (A) Gesneriaceae, (B) Gesnerioideae, and (C) Didymocarpoideae. Abbreviations: Par. = free parameters; lnLik = log-likelihood; AIC = Akaike Information Criterion; AICc = Akaike Information Criterion, corrected; ΔAICc = change in AICc; AICw = AIC weights; BIC = Bayesian Information Criterion; ΔBIC = change in BIC; DEC = Dispersal Extinction Cladogenesis model; DIVALIKE = BioGeoBEARS implementation of DIVA model; BAYAREALIKE = BioGeoBEARS implementation of BayArea model; s = subset sympatry; J = founder-event speciation.SupplApp3.pdfSuppl. Append. 4Summary of gene sequences used in the present study.SupplApp4.pdfSuppl. Append. 5Taxonomic comments and conclusions of the revised phylogenetic hypotheses for the Gesneriaceae.SupplApp5.pdfSuppl. Append. 6Stem and crown age estimates for Gesneriaceae clades and outgroups. For comparison, the ages of stems and crowns from Petrova et al. (2015), Perret et al. (2013), Woo et al. (2011), Bell et al. (2010), and Roalson et al. (2008) are provided. Estimation methods are indicated below reference names. Dates are indicated as Mean (Minimum, Maximum). Abbreviations: BEAST, Bayesian Evolutionary Analysis Sampling Trees; PL, penalized likelihood.SupplApp6.pdfSuppl. Append. 7GeoSSE model testing. Results for (A) Africa and Madagascar, (B) Temperate and Tropical Andes, (C) Amazon and Atlantic Brazil, (D) Caribbean and West Indies, and (E) Pacific and Southeast Asia. Gray boxes denote the model with the best-fit. Significance of constrained models versus unconstrained (full) model is assessed as follows: N.S., P>0.1; *, P<0.1; **, P<0.05; ***, P<0.001. Rate categories: λA, speciation in focal area (endemic species); λB, speciation in all other areas combined; λAB, speciation in widespread species; μA, extinction in focal area (endemic species); μB, extinction in all other areas combined; qA, dispersal out of focal area; qB, dispersal out of all other areas into focal area. Abbreviations: Df = degrees of freedom; lnLik = log-likelihood; AIC = Akaike Information Criterion; AICc = Akaike Information Criterion, corrected; ΔAICc = change in AICc; AICw = Akaike weights; LRT = likelihood ratio test; BIC = Bayesian Information Criterion; ΔBIC = change in BIC.SupplApp7.pdfSuppl. Append. 8SIMMAP ancestral character estimations of flower characters. Results for flower color in (A) Gesneriaceae, (B) Gesnerioideae, (C) Didymocarpoideae; corolla shape in (D) Gesneriaceae, (E) Gesnerioideae, (F) Didymocarpoideae; pollination syndrome (G) Gesneriaceae, (H) Gesnerioideae, (I) Didymocarpoideae.SupplApp8.pdfSuppl. Append. 9SIMMAP ancestral character estimations of epiphytism and growth form characters. Results for Gesneriaceae for (A) epiphytism and (B) unifoliate growth form.SupplApp9.pdfSuppl. Append. 10Geiger statistics for phylogenetic signal (λ), trait evolution at speciation (κ), and rate increase over time (δ). Significance of model fit with the addition of λ, κ, and δ parameters against the null model is assessed as follows: N.S., not significant; *, P<0.01; **, P<0.001. Corolla gibbosity is abbreviated "gibb." and epiphytism is abbreviated "epi."SupplApp10.pdfSuppl. Append. 11BiSSE model testing. Results for epiphytism in (A) Gesneriaceae, (B) Gesnerioideae, (C) Didymocarpoideae; ornithophily in (D) Gesneriaceae, (E) Gesnerioideae, (F) Didymocarpoideae; unifoliate growth in (G) Didymocarpoideae. Gray boxes denote the best fitting model. Significance of constrained models versus unconstrained (full) model is assessed as follows: N.S., P>0.1; *, P<0.1; **, P<0.05; ***, P<0.001. Rate categories: λ, speciation; μ, extinction; q, transition rate. In all cases, estimated rates for the characters of interest are indicated by λ1 and μ1, respectively. Abbreviations: Df = degrees of freedom; lnLik = log-likelihood; AIC = Akaike Information Criterion; AICc = Akaike Information Criterion, corrected; ΔAICc = change in AICc; AICw = Akaike weights; LRT = likelihood ratio test; BIC = Bayesian Information Criterion; ΔBIC = change in BIC.SupplApp11.pdfSuppl. Figure 1Gesneriaceae phylogenetic hypothesis. Numbers above branches refer to (A) aLRT and (B) ML bootstrap percentages, respectively. (C) ML phylogram with branch lengths.SupplFig1.pdfSuppl. Figure 2Calibrated Gesneriaceae phylogenetic hypothesis. Bars on branches reflect the 95% confidence interval on the time estimate. Circled numbers at nodes indicate fossil, geologic, and secondary calibration points, respectively.SupplFig2.pdfSuppl. Figure 3Historical biogeographical hypothesis for Gesneriaceae using the best-fit model BAYAREALIKE+s+J. Geographic areas: A, Temperate and Tropical Andes; B = Amazon and Atlantic Brazil; C = Central America...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Reported Patient Impact and Hospital Capacity by Facility’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/e6ff9332-7a6d-42a7-986b-3deb14475c11 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
The "COVID-19 Reported Patient Impact and Hospital Capacity by Facility" dataset from the U.S. Department of Health & Human Services, filtered for Connecticut. View the full dataset and detailed metadata here: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u
The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Friday to Thursday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities.
The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities.
For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-20 means the average/sum/coverage of the elements captured from that given facility starting and including Friday, November 20, 2020, and ending and including reports for Thursday, November 26, 2020.
Reported elements include an append of either “_coverage”, “_sum”, or “_avg”.
A “_coverage” append denotes how many times the facility reported that element during that collection week.
A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week.
A “_avg” append is the average of the reports provided for that facility for that element during that collection week.
The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”.
This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020.
Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect.
For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied.
On May 3, 2021, the following fields have been added to this data set. hhs_ids previous_day_admission_adult_covid_confirmed_7_day_coverage previous_day_admission_pediatric_covid_confirmed_7_day_coverage previous_day_admission_adult_covid_suspected_7_day_coverage previous_day_admission_pediatric_covid_suspected_7_day_coverage previous_week_personnel_covid_vaccinated_doses_administered_7_day_sum total_personnel_covid_vaccinated_doses_none_7_day_sum total_personnel_covid_vaccinated_doses_one_7_day_sum total_personnel_covid_vaccinated_doses_all_7_day_sum previous_week_patients_covid_vaccinated_doses_one_7_day_sum previous_week_patients_covid_vaccinated_doses_all_7_day_sum
On May 8, 2021, this data set has been converted to a corrected data set. The corrections applied to this data set are to smooth out data anomalies caused by keyed in data errors. To help determine which records have had corrections made to it. An additional Boolean field called is_corrected has been added. To see the numbers as reported by the facilities, go to: https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/uqq2-txqb
On May 13, 2021 Changed vaccination fields from sum to max or min fields. This reflects the maximum or minimum number report
--- Original source retains full ownership of the source dataset ---
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Friday to Thursday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities.
The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities.
For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-20 means the average/sum/coverage of the elements captured from that given facility starting and including Friday, November 20, 2020, and ending and including reports for Thursday, November 26, 2020.
Reported elements include an append of either “_coverage”, “_sum”, or “_avg”.
A “_coverage” append denotes how many times the facility reported that element during that collection week. A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week. A “_avg” append is the average of the reports provided for that facility for that element during that collection week. The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”.
This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020.
Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect.
For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.
Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:
all the WQ runs: 10.5281/zenodo.14830317
chignolin, fps0: 10.5281/zenodo.14826023
chignolin, fps1: 10.5281/zenodo.14830200
chignolin, fps2: 10.5281/zenodo.14830224
chignolin, tps0: 10.5281/zenodo.14830251
chignolin, tps1: 10.5281/zenodo.14830270
chignolin, tps2: 10.5281/zenodo.14830280
The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.
To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.
analysis (code for analyzing the data and generating the figures of the
| paper)
|- figures.ipynb (Jupyter notebook for the analysis)
|- figures (the figures created by the Jupyter notebook)
|- ...
data (all the AIMMD and reference runs, plus general info about the
| simulated systems)
|- chignolin
|- *.py: (code for generating/appending AIMMD runs on a Workstation or
| HPC cluster via Slurm; see the "src" folder below)
|- run.gro (full system positions in the native conformation)
|- mol.pdb (only the peptide positions in the native conformation)
|- topol.top (the system's topology for the GROMACS MD engine)
|- charmmm22star.ff (force field parameter files)
|- run.mdp (GROMACS MD parameters when appending a simulation)
|- randomvelocities.mdp (GROMACS MD parameters when initializing a
| simulation with random velocities)
|- signature.npy, r0.npy (parameters for defining the fraction of native
| contacts involved in the folded/unfolded states
| definition; used by params.py function
| "states_function")
|- dmax.npy, dmin.npy (parameters for defining the feature representation
| of the AIMMD NN model; used by params.py
| function "descriptors_function")
|- equilibrium (reference long equilibrium trajectory files; only the
| peptide positions are saved!)
|- run0.xtc, ..., run3.xtc
|- validation
|- validation.xtc (the validation SPs all together in an XTC file)
|- validation.npy (for each SP, collects the cumulative shooting results
after 10 two-way shooting simulations)
|- fps0 (the first AIMMD-RFPS independent run)
|- equilibriumA (the free simulations around A, already processed
| in PathEnsemble files)
|- traj000001.h5
|- traj000001.tpr (for running the simulation; in that case, please
| retrieve all the trajectory files in the right
| supplemental repository first)
|- traj000001.cpt (for appending the simulation; in that case, please
| retrieve all the trajectory files in the right
| supplemental repository first)
|- traj000002.h5 (in case of re-initialization)
|- ...
|- equilibriumB (the free simulations around B, ...)
|- ...
|- shots0
|- chain.h5 (the path sampling chain)
|- pool.h5 (the selection pool, containing the frames from which
| shooting points are currently selected from)
|- params.py (file containing the states and descriptors definitions,
| the NN fit function, and the AIMMD runs hyperparameters;
| if can be modified to allow for RFPS-AIMMD or the original
| algorithm AIMMD runs)
|- initial.trr (the initial transition for path sampling)
|- manager.log (reports info about the run)
|- network
src (code for generating/appending AIMMD runs on a Workstation or HPC
| cluster via Slurm)
|- generate.py (on a Workstation: initializes the processes; on an HPC
| cluster: creates the sh file for submitting a job)
|- slurm_options.py (to customize and use in case of running on HPC)
|- manager.py (controls SP selection; reweights the paths)
|- shooter.py (performs path sampling simulations)
|- equilibrium.py (performs free simulations)
|- pathensemble.py (code of the PathEnsemble class)
|- utils.py (auxiliary functions for data production and analysis)
* To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:
1. Create a "run directory" folder (same depth as "fps0")
2. Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.
3. (On a Workstation) call:
python generate.py
where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.
4. (On a HPC cluster) call:
python generate.py
sbatch .
* To append to an existing RFPS-AIMMD or AIMMD run
1. Merge the supplemental repository with the trajectory files into this one.
2. Just call again (on a Workstation)
python generate.py
or (on a HPC cluster)
sbatch .
after updating the "nsteps" parameters.
* To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.
Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.
The LTAR network maintains stations for standard meteorological measurements including, generally, air temperature and humidity, shortwave (solar) irradiance, longwave (thermal) radiation, wind speed and direction, barometric pressure, and precipitation. Many sites also have extensive comparable legacy datasets. The LTAR scientific community decided that these needed to be made available to the public using a single web source in a consistent manner. To that purpose, each site sent data on a regular schedule, as frequently as hourly, to the National Agricultural Library, which has developed a web service to provide the data to the public in tabular or graphical form. This archive of the LTAR legacy database exports contains meteorological data through April 30, 2021. For current meteorological data, visit the GeoEvent Meteorology Resources page, which provides tools and dashboards to view and access data from the 18 LTAR sites across the United States. Resources in this dataset:Resource Title: Meteorological data. File Name: ltar_archive_DB.zipResource Description: This is an export of the meteorological data collected by LTAR sites and ingested by the NAL LTAR application. This export consists of an SQL schema definition file for creating database tables and the data itself. The data is provided in two formats: SQL insert statements (.sql) and CSV files (.csv). Please use the format most convenient for you. Note that the SQL insert statements take much longer to run since each row is an individual insert. Description of zip files The ltararchive*.zip files contain database exports. The schema is a .sql file; the data is exported as both SQL inserts and CSV for convenience. There is a README in markdown and PDF in the zips. Contains the database export of the schema and data for the site, site_station, and met tables as SQL insert statements. ltar_archive_db_sql_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_sql_export_20210430.zip --> has data until 2021-04-30 Contains the database export of the schema and data for the site, site_station, and met tables as CSV. ltar_archive_db_csv_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_csv_export_20210430.zip --> has data until 2021-04-30 Contains the raw CSV files that were sent to NAL from the LTAR sites/stations. ltar_rawcsv_archive.zip --> has data until 2021-04-30
2.2 Full Mall Graph Clustering Train The sample training data for this problem is a set of 106981 fingerprints (task2_train_fingerprints.json) and some edges between them. We have provided files that indicate three different edge types, all of which should be treated differently.
task2_train_steps.csv indicates edges that connect subsequent steps within a trajectory. These edges should be highly trusted as they indicate a certainty that two fingerprints were recorded from the same floor.
task2_train_elevations.csv indicate the opposite of the steps. These elevations indicate that the fingerprints are almost definitely from a different floor. You can thus extrapolate that if fingerprint $N$ from trajectory $n$ is on a different floor to fingerprint $M$ from trajectory $m$, then all other fingerprints in both trajectories $m$ and $n$ must also be on seperate floors.
task2_train_estimated_wifi_distances.csv are the pre-computed distances that we have calculated using our own distance metric. This metric is imperfect and as such we know that many of these edges will be incorrect (i.e. they will connect two floors together). We suggest that initially you use the edges in this file to construct your initial graph and compute some solution. However, if you get a high score on task1 then you might consider computing your own wifi distances to build a graph.
Your graph can be at one of two levels of detail, either trajectory level or fingerprint level, you can choose what representation you want to use, but ultimately we want to know the trajectory clusters. Trajectory level would have every node as a trajectory and edges between nodes would occur if fingerprints in their trajectories had high similiraty. Fingerprint level would have each fingerprint as a node. You can lookup the trajectory id of the fingerprint using the task2_train_lookup.json to convert between representations.
To help you debug and train your solution we have provided a ground truth for some of the trajectories in task2_train_GT.json. In this file the keys are the trajectory ids (the same as in task2_train_lookup.json) and the values are the real floor id of the building.
Test The test set is the exact same format as the training set (for a seperate building, we weren't going to make it that easy ;) ) but we haven't included the equivalent ground truth file. This will be withheld to allow us to score your solution.
Points to consider - When doing this on real data we do not know the exact number of floors to expect, so your model will need to decide this for itself as well. For this data, do not expect to find more than 20 floors or less than 3 floors. - Sometimes in balcony areas the similarity between fingerprints on different floors can be deceivingly high. In these cases it may be wise to try to rely on the graph information rather than the individual similarity (e.g. what is the similarity of the other neighbour nodes to this candidate other-floor node?) - To the best of our knowledge there are no outlier fingerprints in the data that do not belong to the building. Every fingerprint belongs to a floor
2.3 Loading the data In this section we will provide some example code to open the files and construct both types of graph.
import os
import json
import csv
import networkx as nx
from tqdm import tqdm
path_to_data = "task2_for_participants/train"
with open(os.path.join(path_to_data,"task2_train_estimated_wifi_distances.csv")) as f:
wifi = []
reader = csv.DictReader(f)
for line in tqdm(reader):
wifi.append([line['id1'],line['id2'],float(line['estimated_distance'])])
with open(os.path.join(path_to_data,"task2_train_elevations.csv")) as f:
elevs = []
reader = csv.DictReader(f)
for line in tqdm(reader):
elevs.append([line['id1'],line['id2']])
with open(os.path.join(path_to_data,"task2_train_steps.csv")) as f:
steps = []
reader = csv.DictReader(f)
for line in tqdm(reader):
steps.append([line['id1'],line['id2'],float(line['displacement'])])
fp_lookup_path = os.path.join(path_to_data,"task2_train_lookup.json")
gt_path = os.path.join(path_to_data,"task2_train_GT.json")
with open(fp_lookup_path) as f:
fp_lookup = json.load(f)
with open(gt_path) as f:
gt = json.load(f)
Fingerprint graph This is one way to construct the fingerprint-level graph, where each node in the graph is a fingerprint. We have added edge weights that correspond to the estimated/true distances from the wifi and pdr edges respectively. We have also added elevation edges to indicate this relationship. You might want to explicitly enforce that there are none of these edges (or any valid elevation edge between trajectories) when developing your solution.
G = nx.Graph()
for id1,id2,dist in tqdm(steps):
G.add_edge(id1, id2, ty = "s", weight=dist)
for id1,id2,dist in tqdm(wifi):
G.add_edge(id1, id2, ty = "w", weight=dist)
for id1,id2 in tqdm(elevs):
G.add_edge(id1, id2, ty = "e")
Trajectory graph The trajectory graph is arguably not as simple as you need to think of a way to represent many wifi connections between trajectories. In the example graph below we just take the mean distance as a weight, but is this really the best representation?
B = nx.Graph()
Get all the trajectory ids from the lookup
valid_nodes = set(fp_lookup.values())
for node in valid_nodes:
B.add_node(node)
Either add an edge or append the distance to the edge data
for id1,id2,dist in tqdm(wifi):
if not B.has_edge(fp_lookup[str(id1)], fp_lookup[str(id2)]):
B.add_edge(fp_lookup[str(id1)],
fp_lookup[str(id2)],
ty = "w", weight=[dist])
else:
B[fp_lookup[str(id1)]][fp_lookup[str(id2)]]['weight'].append(dist)
Compute the mean edge weight
for edge in B.edges(data=True):
B[edge[0]][edge[1]]['weight'] = sum(B[edge[0]][edge[1]]['weight'])/len(B[edge[0]][edge[1]]['weight'])
If you have made a wifi connection between trajectories with an elev, delete the edge
for id1,id2 in tqdm(elevs):
if B.has_edge(fp_lookup[str(id1)], fp_lookup[str(id2)]):
B.remove_edge(fp_lookup[str(id1)],
fp_lookup[str(id2)])
Background: Whole exome sequencing (WES) has been proven to serve as a valuable basis for various applications such as variant calling and copy number variation (CNV) analyses. For those analyses the read coverage should be optimally balanced throughout protein coding regions at sufficient read depth. Unfortunately, WES is known for its uneven coverage within coding regions due to GC-rich regions or off-target enrichment. Results: In order to examine the irregularities of WES within genes, we applied Agilent SureSelectXT exome capture on human samples and sequenced these via Illumina in 2x101 paired-end mode. As we suspected the sequenced insert length to be crucial in the uneven coverage of exome captured samples, we sheared 12 genomic DNA samples to two different DNA insert size lengths, namely 130 and 170 bp. Interestingly, although mean coverages of target regions were clearly higher in samples of 130 bp insert length, the level of evenness was more pronounced in 170 bp samples. Moreover, merging overlapping paired-end reads revealed a positive effect on evenness indicating overlapping reads as another reason for the unevenness. In addition, mutation analysis on a subset of the samples was performed. In these isogenic subclones almost twofold mutations were failed in the 130 bp samples when compared to the 170 bp samples. Visual inspection of the discarded mutation sites exposed low coverages at the sites embedded in high amplitudes of coverage depth in the affected region. Conclusions: Producing longer insert reads could be a good strategy to achieve better uniform read coverage in coding regions and hereby enhancing the effective sequencing yield to provide an improved basis for further variant calling and CNV analyses.
🇺🇸 미국 English After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations. The following dataset provides facility-level data for hospital utilization aggregated on a weekly basis (Sunday to Saturday). These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities. The hospital population includes all hospitals registered with Centers for Medicare & Medicaid Services (CMS) as of June 1, 2020. It includes non-CMS hospitals that have reported since July 15, 2020. It does not include psychiatric, rehabilitation, Indian Health Service (IHS) facilities, U.S. Department of Veterans Affairs (VA) facilities, Defense Health Agency (DHA) facilities, and religious non-medical facilities. For a given entry, the term “collection_week” signifies the start of the period that is aggregated. For example, a “collection_week” of 2020-11-15 means the average/sum/coverage of the elements captured from that given facility starting and including Sunday, November 15, 2020, and ending and including reports for Saturday, November 21, 2020. Reported elements include an append of either “_coverage”, “_sum”, or “_avg”. A “_coverage” append denotes how many times the facility reported that element during that collection week. A “_sum” append denotes the sum of the reports provided for that facility for that element during that collection week. A “_avg” append is the average of the reports provided for that facility for that element during that collection week. The file will be updated weekly. No statistical analysis is applied to impute non-response. For averages, calculations are based on the number of values collected for a given hospital in that collection week. Suppression is applied to the file for sums and averages less than four (4). In these cases, the field will be replaced with “-999,999”. A story page was created to display both corrected and raw datasets and can be accessed at this link: https://healthdata.gov/stories/s/nhgk-5gpv This data is preliminary and subject to change as more data become available. Data is available starting on July 31, 2020. Sometimes, reports for a given facility will be provided to both HHS TeleTracking and HHS Protect. When this occurs, to ensure that there are not duplicate reports, deduplication is applied according to prioritization rules within HHS Protect. For influenza fields listed in the file, the current HHS guidance marks these fields as optional. As a result, coverage of these elements are varied. For recent updates to the dataset, scroll to the bottom of the dataset description. On May 3, 2021, the following fields have been added to this data set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Objective: To compare the effectiveness and safety of controlled-release dinoprostone insert with Foley catheter balloon for cervical ripening and labor induction. Methods: PubMed, Cochrane Central Register of Controlled Trials, Web of Science, and China Knowledge Resource Integrated Database were searched. Only randomized controlled trials comparing controlled-release dinoprostone insert with Foley catheter balloon were included. Risk ratio (RR) or mean difference (MD) with 95% confidence interval (CI) was calculated. Results: Six studies were included with 731 women received dinoprostone insert and 722 Foley catheter. Time from induction to delivery was significantly shortened in dinoprostone insert group compared to Foley catheter group (MD 5.73 h, 95% CI 1.26–10.20). There were no significant differences in vaginal delivery within 24 h (RR 0.75, 95% CI 0.43–1.30) or cesarean section (RR 0.94, 95% CI 0.80–1.12) between two ripening methods. Dinoprostone insert was related with increased rate of excessive uterine contraction (RR 0.07, 95% CI 0.03–0.19), but less oxytocin use (RR 1.86, 95% CI 1.25–2.77) when compared with Foley catheter. Conclusions: Induction of labor with controlled-release dinoprostone insert seems to be more effective than Foley catheter. However, the former method causes excessive uterine contraction more frequently.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Resource Description: This is an export of the meteorological data collected by LTAR sites and ingested by the NAL LTAR application. This export consists of an SQL schema definition file for creating database tables and the data itself. The data is provided in two formats: SQL insert statements (.sql) and CSV files (.csv). Please use the format most convenient for you. Note that the SQL insert statements take much longer to run since each row is an individual insert. Description of zip files The ltararchive*.zip files contain database exports. The schema is a .sql file; the data is exported as both SQL inserts and CSV for convenience. There is a README in markdown and PDF in the zips. Contains the database export of the schema and data for the site, site_station, and met tables as SQL insert statements. ltar_archive_db_sql_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_sql_export_20210430.zip --> has data until 2021-04-30 Contains the database export of the schema and data for the site, site_station, and met tables as CSV. ltar_archive_db_csv_export_20201231.zip --> has data until 2020-12-31 ltar_archive_db_csv_export_20210430.zip --> has data until 2021-04-30
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Spectral Photon Counting Computed Tomography (SPCCT), a ground-breaking development in CT technology, has immense potential to address the persistent problem of metal artefacts in CT images. This study aims to evaluate the potential of Mars photon-counting CT technology in reducing metal artefacts. It focuses on identifying and quantifying clinically significant materials in the presence of metal objects. A multi-material phantom was used, containing inserts of varying concentrations of hydroxyapatite (a mineral present in teeth, bones, and calcified plaque), iodine (used as a contrast agent), CT water (to mimic soft tissue), and adipose (as a fat substitute). Three sets of scans were acquired: with aluminium, with stainless steel, and without a metal insert as a reference dataset. Data acquisition was performed using a Mars SPCCT scanner (Microlab 5×120); operated at 118 kVp and 80 μA. The images were subsequently reconstructed into five energy bins: 7-40, 40-50, 50-60, 60-79, and 79-118 keV. Evaluation metrics including signal-to-noise ratio (SNR), linearity of attenuation profiles, root mean square error (RMSE), and area under the curve (AUC) were employed to assess the energy and material-density images with and without metal inserts. Results show decreased metal artefacts and a better signal-to-noise ratio (up to 25%) with increased energy bins as compared to reference data. The attenuation profile also demonstrated high linearity (R2 >0.95) and lower RMSE across all material concentrations, even in the presence of aluminium and steel. Material identification accuracy for iodine and hydroxyapatite (with and without metal inserts) remained consistent, minimally impacting AUC values. For demonstration purposes, the biological sample was also scanned with the stainless steel volar implant and cortical bone screw, and the images were objectively assessed to indicate the potential effectiveness of SPCCT in replicating real-world clinical scenarios.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.
Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.
Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:
Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).
Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.