43 datasets found

Append Data
kaggle.com
zip
Updated Apr 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zahra Zolghadr (2024). Append Data [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/append-data
Explore at:
zip(52816 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
Zahra Zolghadr
Description
Dataset

This dataset was created by Zahra Zolghadr

Contents
d
TrueData First Party ID Append Data
datarade.ai
Updated Jun 17, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TrueData (2021). TrueData First Party ID Append Data [Dataset]. https://datarade.ai/data-products/truedata-first-party-id-append-data-truedata
Explore at:
Dataset updated
Jun 17, 2021
Dataset authored and provided by
TrueData
Area covered
United States of America
Description
Safely upload client/brand first party CRM/loyalty data. TrueData will append with relevant digital identifiers (HEM, MAID, UID 2.0, CTV IDs) and distribute to directly or via LiveRamp to any destination. Activate across Desktop, Mobile App/Web, CTV, DOOH, Audio. Ingest raw data to build derivative internal products.
d
Firmographic Data Append, B2B, USA, CCPA Compliant
datarade.ai
.json, .csv
Updated Jan 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Versium (2022). Firmographic Data Append, B2B, USA, CCPA Compliant [Dataset]. https://datarade.ai/data-products/firmographic-append-versium-reach-business-direct-versium
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jan 6, 2022
Dataset authored and provided by
Versium
Area covered
United States
Description
With Versium REACH's Firmographic Append tool in the Business to Business Direct product suite you unlock the ability to append valuable firmographic data for your customer and prospect contact lists. With only a few available attributes needed you can tap into Versium's industry-leading identity resolution engine and proprietary database to append rich firmographic data. To append data you will only need any of the following: - Email - Business Domain - Business Name, Address, City, State - Business Name, Phone
d
Voter Data Append, USA, CCPA Compliant, Political Interest Data
datarade.ai
.json, .csv
Updated Dec 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Versium (2021). Voter Data Append, USA, CCPA Compliant, Political Interest Data [Dataset]. https://datarade.ai/data-products/versium-reach-political-interest-data-append-usa-gdpr-an-versium
Explore at:
.json, .csvAvailable download formats
Dataset updated
Dec 5, 2021
Dataset authored and provided by
Versium
Area covered
United States
Description
With Versium REACH Demographic Append you will have access to many different attributes for enriching your data.

Basic, Household and Financial, Lifestyle and Interests, Political and Donor.

Here is a list of what sorts of attributes are available for each output type listed above:

Basic: - Senior in Household - Young Adult in Household - Small Office or Home Office - Online Purchasing Indicator
- Language - Marital Status - Working Woman in Household - Single Parent - Online Education - Occupation - Gender - DOB (MM/YY) - Age Range - Religion - Ethnic Group - Presence of Children - Education Level - Number of Children

Household, Financial and Auto: - Household Income - Dwelling Type - Credit Card Holder Bank - Upscale Card Holder - Estimated Net Worth - Length of Residence - Credit Rating - Home Own or Rent - Home Value - Home Year Built - Number of Credit Lines - Auto Year - Auto Make - Auto Model - Home Purchase Date - Refinance Date - Refinance Amount - Loan to Value - Refinance Loan Type - Home Purchase Price - Mortgage Purchase Amount - Mortgage Purchase Loan Type - Mortgage Purchase Date - 2nd Most Recent Mortgage Amount - 2nd Most Recent Mortgage Loan Type - 2nd Most Recent Mortgage Date - 2nd Most Recent Mortgage Interest Rate Type - Refinance Rate Type - Mortgage Purchase Interest Rate Type - Home Pool

Lifestyle and Interests: - Mail Order Buyer - Pets - Magazines - Reading
- Current Affairs and Politics
- Dieting and Weight Loss - Travel - Music - Consumer Electronics - Arts
- Antiques - Home Improvement - Gardening - Cooking - Exercise
- Sports - Outdoors - Womens Apparel
- Mens Apparel - Investing - Health and Beauty - Decorating and Furnishing

Political and Donor: - Donor Environmental - Donor Animal Welfare - Donor Arts and Culture - Donor Childrens Causes - Donor Environmental or Wildlife - Donor Health - Donor International Aid - Donor Political - Donor Conservative Politics - Donor Liberal Politics - Donor Religious - Donor Veterans - Donor Unspecified - Donor Community - Party Affiliation
a
GRM append here
hub.arcgis.com
Updated Dec 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tippecanoe County Assessor Hub Community (2021). GRM append here [Dataset]. https://hub.arcgis.com/datasets/8f108ad456e048718839baf71a599609
Explore at:
Dataset updated
Dec 1, 2021
Dataset authored and provided by
Tippecanoe County Assessor Hub Community
Area covered

Description
This table feeds multiple apps for Tippecanoe County. Columns are the bare minimum details for generating a GRM. When new GRM data points are collected they should be appended here.
d
Demographic Data Append (Age, Gender, Marital Status, etc) Append API, USA,...
datarade.ai
.json, .csv
Updated Mar 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Versium (2023). Demographic Data Append (Age, Gender, Marital Status, etc) Append API, USA, CCPA Compliant [Dataset]. https://datarade.ai/data-products/versium-reach-consumer-basic-demographic-age-gender-mari-versium
Explore at:
.json, .csvAvailable download formats
Dataset updated
Mar 16, 2023
Dataset authored and provided by
Versium
Area covered
United States
Description
With Versium REACH Demographic Append you will have access to many different attributes for enriching your data.

Basic, Household and Financial, Lifestyle and Interests, Political and Donor.

Here is a list of what sorts of attributes are available for each output type listed above:

Basic: - Senior in Household - Young Adult in Household - Small Office or Home Office - Online Purchasing Indicator
- Language - Marital Status - Working Woman in Household - Single Parent - Online Education - Occupation - Gender - DOB (MM/YY) - Age Range - Religion - Ethnic Group - Presence of Children - Education Level - Number of Children

Household, Financial and Auto: - Household Income - Dwelling Type - Credit Card Holder Bank - Upscale Card Holder - Estimated Net Worth - Length of Residence - Credit Rating - Home Own or Rent - Home Value - Home Year Built - Number of Credit Lines - Auto Year - Auto Make - Auto Model - Home Purchase Date - Refinance Date - Refinance Amount - Loan to Value - Refinance Loan Type - Home Purchase Price - Mortgage Purchase Amount - Mortgage Purchase Loan Type - Mortgage Purchase Date - 2nd Most Recent Mortgage Amount - 2nd Most Recent Mortgage Loan Type - 2nd Most Recent Mortgage Date - 2nd Most Recent Mortgage Interest Rate Type - Refinance Rate Type - Mortgage Purchase Interest Rate Type - Home Pool

Lifestyle and Interests: - Mail Order Buyer - Pets - Magazines - Reading
- Current Affairs and Politics
- Dieting and Weight Loss - Travel - Music - Consumer Electronics - Arts
- Antiques - Home Improvement - Gardening - Cooking - Exercise
- Sports - Outdoors - Womens Apparel
- Mens Apparel - Investing - Health and Beauty - Decorating and Furnishing

Political and Donor: - Donor Environmental - Donor Animal Welfare - Donor Arts and Culture - Donor Childrens Causes - Donor Environmental or Wildlife - Donor Health - Donor International Aid - Donor Political - Donor Conservative Politics - Donor Liberal Politics - Donor Religious - Donor Veterans - Donor Unspecified - Donor Community - Party Affiliation
Augmented_health_Heart_Rate
kaggle.com
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaukat hussain (2022). Augmented_health_Heart_Rate [Dataset]. https://www.kaggle.com/shaukathussain/augmented-health-heart-rate/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shaukat hussain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Motivation

The real motivation behind creating this dataset was to work on a project of IOT health monitoring device

Content

There are columns heart rate , sysBP , diaBP, height, weight, BMI etc. these parameters are necessary for predicting heart condition

Acknowledgements

The height/weight tables with heart rate are taken from this website

https://www.mymathtables.com/chart/health-wellness/height-weight-table-for-all-ages.html

Methodology

The following code has been used to generate the data according t research from different resources on the web: `import numpy as np import pandas as pd

age = np.random.randint(1,70,500000) sex = np.random.randint(0,2,500000) SysBP = np.random.randint(105,147,500000) DiaBP = np.random.randint(73,120,500000) HR = np.random.randint(78,200,500000) weightKg = np.random.randint(2,120,500000) heightCm = np.random.randint(48,185,500000) BMI = weightKg / heightCm / heightCm * 10000 \data=[] for age,sex,SysBP,DiaBP,HR,weightKg,heightCm,BMI in zip(age,sex,SysBP,DiaBP,HR,weightKg,heightCm,BMI): if BMI > 40 or BMI < 10: continue elif ( age < 20): continue elif ( weightKg < 45): continue elif (1 <= age <= 10) & (17 < BMI < 31) & (104< SysBP <121) & ( 73 < DiaBP < 81) & ( 99 < HR <= 200) & ( 3 < weightKg <= 36) & ( 48 < heightCm <= 139) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (10 < age <= 20) & (17 < BMI < 31) & (104< SysBP <121) & ( 73 < DiaBP <= 81) & ( 99 < HR <= 200) & ( 36 < weightKg < 60) & ( 139 < heightCm < 170) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (20 < age <= 30) & (17 < BMI < 31) & (108< SysBP <=134) & ( 75 <= DiaBP <= 84) & ( 94 < HR <= 190) & ( 28 < weightKg < 80) & ( 137 <= heightCm <= 180) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (30 < age <= 40) & (17 < BMI < 31) & (110< SysBP <=135) & ( 81 <= DiaBP <= 86) & ( 93 <= HR <= 180) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (40 < age <= 50) & (17 < BMI < 31) & (112< SysBP <=140) & ( 79 <= DiaBP <= 89) & ( 90 <= HR <= 170) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (50 < age <= 90) & (17 < BMI < 31) & (116< SysBP <=147) & ( 81 <= DiaBP <= 91) & ( 85 <= HR <= 160) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif ( 20 <= age < 90) & (17 < BMI < 31) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) else: data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),1]))) df1 = pd.DataFrame(data) df1.to_csv("Health_heart_experimental.csv") `
Z
Simulation data and code for "Optimal Rejection-Free Path Sampling"
data-staging.niaid.nih.gov
zenodo.org
Updated Mar 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lazzeri, Gianmarco (2025). Simulation data and code for "Optimal Rejection-Free Path Sampling" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14922167
Explore at:
Dataset updated
Mar 25, 2025
Dataset provided by
Goethe University Frankfurt
Authors
Lazzeri, Gianmarco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.

Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:

all the WQ runs: 10.5281/zenodo.14830317chignolin, fps0: 10.5281/zenodo.14826023chignolin, fps1: 10.5281/zenodo.14830200chignolin, fps2: 10.5281/zenodo.14830224chignolin, tps0: 10.5281/zenodo.14830251chignolin, tps1: 10.5281/zenodo.14830270chignolin, tps2: 10.5281/zenodo.14830280

The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.

To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.

Data structure and content

analysis (code for analyzing the data and generating the figures of the| paper)|- figures.ipynb (Jupyter notebook for the analysis)|- figures (the figures created by the Jupyter notebook) |- ...

data (all the AIMMD and reference runs, plus general info about the| simulated systems)|- chignolin |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm; see the "src" folder below) |- run.gro (full system positions in the native conformation) |- mol.pdb (only the peptide positions in the native conformation) |- topol.top (the system's topology for the GROMACS MD engine) |- charmmm22star.ff (force field parameter files) |- run.mdp (GROMACS MD parameters when appending a simulation) |- randomvelocities.mdp (GROMACS MD parameters when initializing a | simulation with random velocities) |- signature.npy, r0.npy (parameters for defining the fraction of native | contacts involved in the folded/unfolded states | definition; used by params.py function | "states_function") |- dmax.npy, dmin.npy (parameters for defining the feature representation | of the AIMMD NN model; used by params.py | function "descriptors_function") |- equilibrium (reference long equilibrium trajectory files; only the | peptide positions are saved!) |- run0.xtc, ..., run3.xtc |- validation |- validation.xtc (the validation SPs all together in an XTC file) |- validation.npy (for each SP, collects the cumulative shooting results after 10 two-way shooting simulations) |- fps0 (the first AIMMD-RFPS independent run) |- equilibriumA (the free simulations around A, already processed | in PathEnsemble files) |- traj000001.h5 |- traj000001.tpr (for running the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000001.cpt (for appending the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000002.h5 (in case of re-initialization) |- ... |- equilibriumB (the free simulations around B, ...) |- ... |- shots0 |- chain.h5 (the path sampling chain) |- pool.h5 (the selection pool, containing the frames from which | shooting points are currently selected from) |- params.py (file containing the states and descriptors definitions, | the NN fit function, and the AIMMD runs hyperparameters; | if can be modified to allow for RFPS-AIMMD or the original | algorithm AIMMD runs) |- initial.trr (the initial transition for path sampling) |- manager.log (reports info about the run) |- network.h5 (NN weights of the model at different path | sampling steps) |- fps1, fps2 (the other RFPS-AIMMD runs) |- tps0 (the first AIMMD-TPS, or "standard" AIMMD, run) |- ... |- shots0 |- ... |- chain_weights.npy (weights of the trials in TPS; only the trials | with non zero weight had been accepted) |- tps1, tps2 (the other AIMMD runs, with TPS for the shooting simulations)|- wq (Wolfe-Quapp 2D system) |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm) |- run.gro (dummy gro file produced for compatibility reasons) |- integrator.py (custom MD engine) |- equilibrium (reference long simulation) |- transition000001.xtc (extracted from reference long simulation) |- transition000002.xtc |- ... |- transitions.h5 (PathEnsemble file with all the transitions) |- reference |- grid_X.npy, grid_Y.npy (X, Y grid for 2D plots) |- grid_V.npy (PES projected on the grid) |- grid_committor_relaxation.npy (true committor on the grid solved | with the relaxation method on the | backward Kolmogorov equation; the | code for doing this is in utils.py) |- grid_boltzmann_distribution.npy (Boltzmann distribution on the grid) |- pe.h5 (equilibrium distribution processed as a PathEnsemble file) |- tpe.h5 (TPE distribution processed as a PathEnsemble file) |- ... |- uniform_tps (reference TPS run with uniform SP selection) |- chain.h5 (PathEnsemble file containin all the accepted paths | with their correct weight) |- fps0, ..., fps9 (the independent AIMMD-RFPS runs) |- ... |- tps0, ..., tps9 (the independent AIMMD-TPS, or "standard" AIMMD runs)

src (code for generating/appending AIMMD runs on a Workstation or HPC| cluster via Slurm)|- generate.py (on a Workstation: initializes the processes; on an HPC| cluster: creates the sh file for submitting a job)|- slurm_options.py (to customize and use in case of running on HPC)|- manager.py (controls SP selection; reweights the paths)|- shooter.py (performs path sampling simulations)|- equilibrium.py (performs free simulations)|- pathensemble.py (code of the PathEnsemble class)|- utils.py (auxiliary functions for data production and analysis)

Running/appending AIMMD runs

To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:

Create a "run directory" folder (same depth as "fps0")

Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.

(On a Workstation) call:

python generate.py

where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.

(On a HPC cluster) call:

python generate.py -s slurm_options.pysbatch ._job.sh

To append to an existing RFPS-AIMMD or AIMMD run

Merge the supplemental repository with the trajectory files into this one.

Just call again (on a Workstation)

python generate.py

or (on a HPC cluster)

sbatch ._job.sh

after updating the "nsteps" parameters.

To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.

Reproducing the analysis

Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.

leap-val-data-f64

kaggle.com

zip

Updated Jun 13, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Bilzard (2024). leap-val-data-f64 [Dataset]. https://www.kaggle.com/tatamikenn/leap-val-data-f64

Explore at:

zip(8800477931 bytes)Available download formats

Dataset updated

Jun 13, 2024

Authors

Bilzard

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Validation Data for Stacking - 8th Year Validation Set (1/6 Subsample)

The sample_id is created sequentially from the 1st year (hence, it is different from the sample_id in the Kaggle Dataset). Note that while the original data follows a naming convention with 'train_...', this dataset simply uses integer IDs.

The data is divided into 12 chunks, each containing data from 8th year February to 9th year January.

Period

0008-02 to 0009-01

Sub-sampling Method

Sub-sample to 1/6 of all samples, ignoring leap day (2/29) in leap years (offset=0).

Source code:

from pathlib import Path

import click
import pandas as pd
import polars as pl


@click.command()
@click.argument("subsample-rate")
@click.argument("offset")
def main(subsample_rate, offset):
  NUM_YEARS = 8
  MONTH_DAY = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
  NUM_SAMPLES = (8 * sum(MONTH_DAY) * 72) // subsample_rate
  assert (
    0 <= offset < subsample_rate
  ), f"assertion failed: 0 <= offset < subsample_rate, got {offset} and {subsample_rate}."

  idx = 0
  file_id = 0
  data = []
  try:
    for year in range(NUM_YEARS):
      for month in range(1, 13):
        for day in range(MONTH_DAY[month % 12]):
          for term in range(72):
            if file_id == NUM_SAMPLES:
              raise Exception
            if idx % subsample_rate == offset:
              data.append(
                dict(
                  sample_id=file_id,
                  year=year + 1,
                  real_year=year + (month // 12) + 1,
                  month=month % 12 + 1,
                  day=day + 1,
                  min_of_day=term * 1200,
                )
              )
              file_id += 1
            idx += 1
  except Exception:
    print("error")
    pass

  output_path = Path(
    "/ml-docker/working/kaggle-leap-private/data/hugging_face_download"
  )
  if not output_path.exists():
    output_path.mkdir()

  df = pl.from_pandas(pd.DataFrame(data))
  print(df.filter(pl.col("year").eq(8)))
  df.write_parquet(output_path / f"subsample_s{subsample_rate}_o{offset}.pqt")


if _name_ == "_main_":
  main()

Positioning of This Data

Low-Resolution Real Geography

  11.5° x 11.5° horizontal resolution (384 grid columns)
  100 million total samples (744 GB)
  1.9 MB per input file, 1.1 MB per output file

Low resolution (774 GB) -> I refer to this as the 1/1 full set.
Kaggle Dataset -> A 1/7 subsample of 1, containing data from 1st to 7th year (excluding the 8th year).
leap-val-data-f64 (this dataset) -> A 1/6 subsample of 1, containing only the 8th year data.

Therefore, it is appropriate to evaluate the model trained on the Kaggle Dataset with this dataset.

CT data

kaggle.com

zip

Updated Jul 3, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

NG NM　WT (2023). CT data [Dataset]. https://www.kaggle.com/datasets/ngnmwt/ct-data/code

Explore at:

zip(22039091 bytes)Available download formats

Dataset updated

Jul 3, 2023

Authors

NG NM　WT

Area covered

Connecticut

Description

拡張型 https://thomasnyberg.com/cpp_extension_modules.html

https://pypi.org/project/ct-python/

# coding: utf-8 import numpy as np from scipy.sparse import lil_matrix,csc_matrix from scipy.sparse.linalg import lsqr from PIL import Image import math def gen_col_index(NX,NY,angle,step): L = int(max(NX,NY)*1.42) nx = math.cos(angle*math.pi/180) ny = math.sin(angle*math.pi/180) x0 = NX/2 + nx * step y0 = NY/2 + ny * step c = -nx*x0 - ny*y0 cols=[ ] if abs(nx)>abs(ny): for y in range(NY): x1 = int(-(ny*y+c)/nx+0.5) for x in range(x1-5,x1+6): if x<0 or x>=NX: continue dist = abs(nx*x+ny*y+c) if dist<5: wt = math.exp(-dist**2/3.0) k = x + y*NX cols.append([k,wt]) else: for x in range(NX): y1 = int(-(nx*x+c)/ny + 0.5) for y in range(y1-5,y1+6): if y<0 or y>=NY: continue dist = abs(nx*x+ny*y+c) if dist<5: wt = math.exp(-dist**2/3.0) k = x + y*NX cols.append([k,wt]) return cols def gen_matrix(NX,NY,data): N = data.shape[0] M = NX*NY W = lil_matrix((N,M)) F = np.zeros((N,)) for j,x in enumerate(data): angle = x[0] step = x[1] val = x[2] cols = gen_col_index(NX,NY,angle,step) for k,wt in cols: W[j,k]=wt F[j]=val return W,F ### data = np.load('scanned_data.npy') NX=128 NY=128 W,F = gen_matrix(NX,NY,data) W = csc_matrix(W) res = lsqr(W,F,show=True,iter_lim=100) X = res[0] image = np.zeros((NX,NY)) for x in range(NX): for y in range(NY): val = X[x + y*NX] image[x,y] = val pil_img = Image.fromarray(image.astype(np.uint8)) pil_img.show() pil_img.save('reconstructed_image.png')

# coding: utf-8 import numpy as np from PIL import Image import math def calc_signal(img, angle, step): NX = img.shape[0] NY = img.shape[1] x_c = NX/2 y_c = NY/2 nx = math.cos(angle*math.pi/180) ny = math.sin(angle*math.pi/180) x0 = x_c + nx * step y0 = y_c + ny * step c = -nx*x0 - ny*y0 sig=0 for x in range(NX): for y in range(NY): dist = abs(nx*x + ny*y + c) if dist<5: wt = math.exp(-dist*2/3.0) sig = sig + wt*img[x,y] return sig def scan_image(img): data=[] L = int(np.max(img.shape)/2*math.sqrt(2)+1) for angle in np.arange(0,180,5): # step angle of 5 degrees print("angle=",angle) for step in range(-L,L,1): sig = calc_signal(img, angle, step) if sig>0: data.append([angle,step,sig]) return np.array(data) # img = np.array(Image.open('lena.png')) data = scan_image(img) np.save('scanned_data.npy',data)



pip install ct-python

pip install git+https://github.com/configtree/ct-python-sdk

import ct_python 

python setup.py install --user

import ct_python

configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Version() # Version | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.partial_update_version(body, id, organization_slug) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->partial_update_version: %s
" % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.TokenRefresh() # TokenRefresh | try: api_response = api_instance.refresh_token(body) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->refresh_token: %s
" % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Application() # Application | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.update_application(body, id, organization_slug) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->update_application: %s
" % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Configuration() # Configuration | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.update_configurat...

Legislation API - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jul 29, 2010
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2010). Legislation API - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/legislation-api
Explore at:
Dataset updated
Jul 29, 2010
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
API for www.legislation.gov.uk - launched by The National Archives on 29/07/2010 - giving access to the statute book at various levels, for various times, as reusable html fragments, xml and rdf. The API is RESTful and uses content negotiation, so full access to the data can be achieved using http requests. Alternatively, just append data.xml or data.rdf to any legislation page on the website to return the underlying data. The full API is also available from http://legislation.data.gov.uk.

Mayo Tile Images

kaggle.com

zip

Updated Sep 4, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

tmyok (2022). Mayo Tile Images [Dataset]. https://www.kaggle.com/datasets/tmyok1984/mayo-jpg-dataset-1024

Explore at:

zip(40792362631 bytes)Available download formats

Dataset updated

Sep 4, 2022

Authors

tmyok

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.kaggle.com/code/tmyok1984/mayo-tile-generation-using-pyvips

def make_tiles(image_path, tile_size=1024, max_tiles=64, avg_thr=230, std_thr=15, clip_edge=0.05):

  image = pyvips.Image.new_from_file(image_path, access='sequential')

  # cropping
  offset_x = int(image.width * clip_edge)
  offset_y = int(image.height * clip_edge)
  w = int(image.width * (1-clip_edge*2))
  h = int(image.height * (1-clip_edge*2))
  image = image.crop(offset_x, offset_y, w, h)

  # padding
  pad_w = (tile_size - image.width%tile_size)%tile_size
  pad_h = (tile_size - image.height%tile_size)%tile_size
  image = image.embed(
    pad_w//2, pad_h//2,
    image.width+pad_w, image.height+pad_h,
    extend="mirror")

  # Get the scanning position of the image
  x_pos_list = []
  y_pos_list = []
  for y in range(0, image.height, tile_size):
    for x in range(0, image.width, tile_size):
      x_pos_list.append(x)
      y_pos_list.append(y)

  # Get the cropping position of the image
  selected_x_pos_list = []
  selected_y_pos_list = []
  avg_list = []
  for x, y in zip(x_pos_list, y_pos_list):
    tile = image.crop(x, y, tile_size, tile_size)
    avg = tile.avg()
    std = tile.deviate()
    if avg < avg_thr and std > std_thr:
      selected_x_pos_list.append(x)
      selected_y_pos_list.append(y)
      avg_list.append(avg)

  # Sort by ascending order of average brightness
  sorted_idx = np.argsort(np.array(avg_list))
  selected_x_pos_array = np.array(selected_x_pos_list)[sorted_idx][:max_tiles]
  selected_y_pos_array = np.array(selected_y_pos_list)[sorted_idx][:max_tiles]

  # crop
  images = []
  for x, y in zip(selected_x_pos_array, selected_y_pos_array):
    tile = image.crop(x, y, tile_size, tile_size)
    img = tile.numpy()
    images.append(img)

  if len(images) > 0:
    images = np.stack(images)

  del image
  gc.collect()

  return images

LeetCode CN Problems

kaggle.com

zip

Updated Apr 5, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

imba-tjd (2024). LeetCode CN Problems [Dataset]. https://www.kaggle.com/datasets/imbatjd/leetcode-cn-problems/code

Explore at:

zip(530963 bytes)Available download formats

Dataset updated

Apr 5, 2024

Authors

imba-tjd

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The data was collected on 2024-04-05 containing 3492 problems.

Cleaned via the following script.

import json
import csv
from io import TextIOWrapper


def clean(data: dict):
  questions = data['data']['problemsetQuestionList']['questions']
  for q in questions:
    yield {
      'id': q['frontendQuestionId'],
      'difficulty': q['difficulty'],
      'title': q['title'],
      'titleCn': q['titleCn'],
      'titleSlug': q['titleSlug'],
      'paidOnly': q['paidOnly'],
      'acRate': round(q['acRate'], 3),
      'topicTags': [t['name'] for t in q['topicTags']],
    }


def out_jsonl(f: TextIOWrapper):
  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    for q in clean(data):
      f.write(json.dumps(q, ensure_ascii=False))
      f.write('
')


def out_json(f: TextIOWrapper):
  l = []
  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    for q in clean(data):
      l.append(q)

  json.dump(l, f, ensure_ascii=False)


def out_csv(f: TextIOWrapper):
  writer = csv.DictWriter(f, fieldnames=[
    'id', 'difficulty', 'title', 'titleCn', 'titleSlug', 'paidOnly', 'acRate', 'topicTags'
  ])
  writer.writeheader()

  for id in range(0, 35):
    with open(f'data/{id}.json', encoding='u8') as f2:
      data = json.load(f2)

    writer.writerows(clean(data))


with open('data.jsonl', 'w', encoding='u8') as f:
  out_jsonl(f)

with open('data.json', 'w', encoding='u8') as f:
  out_json(f)

with open('data.csv', 'w', encoding='u8', newline='') as f:
  out_csv(f)

G
Data Enrichment Platform Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Sep 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Data Enrichment Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-enrichment-platform-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Sep 1, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Data Enrichment Platform Market Outlook

According to our latest research, the global Data Enrichment Platform market size reached USD 2.47 billion in 2024, reflecting robust adoption across multiple industries. The market is projected to grow at a CAGR of 14.2% from 2025 to 2033, with the total market value expected to reach USD 7.72 billion by 2033. This remarkable growth is fueled by the increasing demand for high-quality, actionable data to drive decision-making, enhance customer engagement, and support digital transformation initiatives across sectors.

A primary growth driver for the Data Enrichment Platform market is the exponential rise in data volumes generated by businesses worldwide. Organizations are increasingly recognizing the importance of transforming raw data into valuable insights, which is only possible through advanced data enrichment solutions. These platforms enable companies to append, cleanse, and validate their datasets, ensuring high data accuracy and relevancy. The proliferation of digital channels, IoT devices, and cloud-based applications has further intensified the need for real-time data enrichment, as enterprises strive to personalize customer experiences and optimize operational efficiency. Additionally, the rapid adoption of artificial intelligence and machine learning technologies within data enrichment platforms has significantly improved the speed and accuracy of data processing, making these solutions indispensable for modern enterprises.

Another significant factor propelling market growth is the rising focus on regulatory compliance and risk mitigation. With stringent data privacy regulations such as GDPR, CCPA, and others coming into effect, organizations must ensure that their data repositories are accurate, up-to-date, and compliant. Data enrichment platforms help businesses identify outdated or incorrect information, reduce compliance risks, and maintain robust audit trails. This capability is especially crucial for sectors such as BFSI, healthcare, and government, where data integrity and compliance are paramount. The integration of enrichment solutions with existing CRM, ERP, and marketing automation systems has further expanded their applications, making it easier for organizations to maintain clean and compliant datasets across all functions.

The evolving landscape of customer engagement and marketing strategies is also fueling demand for data enrichment platforms. Businesses are increasingly leveraging enriched data to gain a 360-degree view of their customers, segment audiences more effectively, and deliver hyper-personalized content. Enhanced data quality empowers sales and marketing teams to target prospects with precision, improve lead scoring, and drive higher conversion rates. Moreover, in highly competitive sectors like retail and e-commerce, enriched data supports dynamic pricing, inventory management, and customer retention initiatives. As digital transformation accelerates across industries, the ability to derive actionable insights from enriched data is becoming a key differentiator for businesses seeking to gain a competitive edge.

From a regional perspective, North America continues to dominate the Data Enrichment Platform market, accounting for the largest share in 2024. This leadership is attributed to the presence of major technology providers, high adoption rates of advanced analytics solutions, and a mature digital infrastructure. Europe follows closely, driven by stringent data privacy regulations and a strong focus on data-driven decision-making. The Asia Pacific region is emerging as a high-growth market, supported by rapid digitalization, expanding e-commerce sectors, and increasing investments in cloud and AI technologies. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions ramp up their digital transformation efforts.

Merchant Data Enrichment is becoming a pivotal aspect of the data enrichment landscape, especially as businesses seek to enhance their understanding of transaction data. By leveraging merchant data enrichment, organizations can gain deeper insights into consumer spending patterns and merchant behaviors, which are critical for tailoring marketing strategies and improving customer engagement. This process involves appending
H
Replication Data for: What do cross-country surveys tell us about social...
dataverse.harvard.edu
dataone.org
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Tannenbaum; Alain Cohn; Christian L. Zünd; Michel A. Maréchal (2022). Replication Data for: What do cross-country surveys tell us about social capital? [Dataset]. http://doi.org/10.7910/DVN/NDDWHJ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/NDDWHJ
Dataset updated
Aug 1, 2022
Dataset provided by
Harvard Dataverse
Authors
David Tannenbaum; Alain Cohn; Christian L. Zünd; Michel A. Maréchal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Code and data to reproduce all results and graphs reported in Tannenbaum et al. (2022). This folder contains data files (.dta files) and a Stata do-file (code.do) that stitches together the different data files and executes all analyses and produces all figures reported in the paper. The do-file uses a number of user-written packages, which are listed below. Most of these can be installed using the ssc install command in Stata. Also, users will need to change the current directory path (at the start of the do-file) before executing the code. List of user written packages (descriptions): revrs (reverse-codes variable) ereplace (extends the egen command to permit replacing) grstyle (changes the settings for the overall look of graphs) spmap (used for graphing spatial data) qqvalue (used for obtaining Benjamini-Hochberg corrected p-values) parmby (creates a dataset by calling an estimation command for each by-group) domin (used to perform dominance analyses) coefplot (used for creating coefficient plots) grc1leg (combine graphs with a single common legend) xframeappend (append data frames to the end of the current data frame)
2024-home-credit-public-repo
kaggle.com
zip
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergey Saharovskiy (2024). 2024-home-credit-public-repo [Dataset]. https://www.kaggle.com/datasets/sergiosaharovskiy/2024-home-credit-public-repo
Explore at:
zip(1982331 bytes)Available download formats
Dataset updated
Feb 6, 2024
Authors
Sergey Saharovskiy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Welcome

It is Sergey's Home Credict Public Notebook code repo.

Train files Null count

The calculation was obtained by using the below snippet:

shapes, nan_total_count = [], [] for fp in tqdm(train_disk_usage.path): df = pl.read_csv(fp) shapes.append(df.shape) nan_total_count.append(df.null_count().to_pandas().sum().sum()) del df train_disk_usage[['height', 'width']] = shapes train_disk_usage['null_count'] = nan_total_count train_disk_usage['isna_%'] = train_disk_usage.null_count / np.prod(shapes, 1) train_disk_usage.to_csv('data/train_disk_usage.csv', index=False)
v
Enhancing the Analytic Capacity of NCT00364351 using a Statistical Linkage...
search.vivli.org
Updated 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Project Data Sphere (2020). Enhancing the Analytic Capacity of NCT00364351 using a Statistical Linkage Method to Append Socioeconomic and Health Care Access Variables from the Medical Expenditure Panel Survey [Dataset]. http://doi.org/10.25934/00005378
Explore at:
Unique identifier
https://doi.org/10.25934/00005378
Dataset updated
2020
Dataset provided by
datacite
Vivli
Authors
Project Data Sphere
Description
Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform to serve all elements of the international research community. Our mission is to promote, coordinate, and facilitate scientific sharing and reuse of clinical research data through the creation and implementation of a sustainable global data-sharing enterprise. The Vivli platform includes an independent data repository, in-depth search engine and a cloud-based, secure analytics platform.
v
Enhancing the Analytic Capacity of NCT00048230 using a Statistical Linkage...
search.vivli.org
Updated 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Project Data Sphere (2020). Enhancing the Analytic Capacity of NCT00048230 using a Statistical Linkage Method to Append Socioeconomic and Health Care Access Variables from the Medical Expenditure Panel Survey [Dataset]. http://doi.org/10.25934/00005376
Explore at:
Unique identifier
https://doi.org/10.25934/00005376
Dataset updated
2020
Dataset provided by
datacite
Vivli
Authors
Project Data Sphere
Description
Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform to serve all elements of the international research community. Our mission is to promote, coordinate, and facilitate scientific sharing and reuse of clinical research data through the creation and implementation of a sustainable global data-sharing enterprise. The Vivli platform includes an independent data repository, in-depth search engine and a cloud-based, secure analytics platform.
h
hh-rlhf-helpful-only
huggingface.co
Updated Jan 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caden Juang (2025). hh-rlhf-helpful-only [Dataset]. https://huggingface.co/datasets/kh4dien/hh-rlhf-helpful-only
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 16, 2025
Authors
Caden Juang
Description
from datasets import Dataset, DatasetDict from collections import defaultdict import re import random import json

data = defaultdict(list)

paths = { "test_online": "./test_online.jsonl", "train_online": "./train_online.jsonl", "test_rejection": "./test_rejection.jsonl", "train_rejection": "./train_rejection.jsonl", }

for name, path in paths.items(): with open(path, "r") as f: for line in f: data[name].append(json.loads(line))

def split_data(text):… See the full description on the dataset page: https://huggingface.co/datasets/kh4dien/hh-rlhf-helpful-only.
g
Statistical Computing: SPSS
search.gesis.org
dataverse.unc.edu
+1more
Updated Oct 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zimmer, Catherine (2021). Statistical Computing: SPSS [Dataset]. https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631
Explore at:
Dataset updated
Oct 29, 2021
Dataset provided by
UNC Dataverse
GESIS search
Authors
Zimmer, Catherine
License
https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631
Description
Part 1 of the course will offer an introduction to SPSS and teach how to work with data saved in SPSS format. Part 2 will demonstrate how to work with SPSS syntax, how to create your own SPSS data files, and how to convert data in other formats to SPSS. Part 3 will teach how to append and merge SPSS files, demonstrate basic analytical procedures, and show how to work with SPSS graphics.

Facebook

Twitter

Click to copy link

Link copied

Cite

Zahra Zolghadr (2024). Append Data [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/append-data

Append Data

Explore at:

zip(52816 bytes)Available download formats

Dataset updated

Apr 6, 2024

Authors

Zahra Zolghadr

Description

Dataset

This dataset was created by Zahra Zolghadr

Clear search

Close search

Google apps

Main menu

Append Data

Dataset

Contents

TrueData First Party ID Append Data

Firmographic Data Append, B2B, USA, CCPA Compliant

Voter Data Append, USA, CCPA Compliant, Political Interest Data

GRM append here

Demographic Data Append (Age, Gender, Marital Status, etc) Append API, USA,...

Augmented_health_Heart_Rate

The Motivation

Content

Acknowledgements

Methodology

Simulation data and code for "Optimal Rejection-Free Path Sampling"

leap-val-data-f64

Validation Data for Stacking - 8th Year Validation Set (1/6 Subsample)

Period

Sub-sampling Method

Positioning of This Data

CT data

Legislation API - Dataset - data.gov.uk

Mayo Tile Images

LeetCode CN Problems

Data Enrichment Platform Market Research Report 2033

Data Enrichment Platform Market Outlook

Replication Data for: What do cross-country surveys tell us about social...

2024-home-credit-public-repo

Welcome

Train files Null count

Enhancing the Analytic Capacity of NCT00364351 using a Statistical Linkage...

Enhancing the Analytic Capacity of NCT00048230 using a Statistical Linkage...

hh-rlhf-helpful-only

Statistical Computing: SPSS

Append Data

Dataset

Contents