43 datasets found
  1. Append Data

    • kaggle.com
    zip
    Updated Apr 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zahra Zolghadr (2024). Append Data [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/append-data
    Explore at:
    zip(52816 bytes)Available download formats
    Dataset updated
    Apr 6, 2024
    Authors
    Zahra Zolghadr
    Description

    Dataset

    This dataset was created by Zahra Zolghadr

    Contents

  2. d

    TrueData First Party ID Append Data

    • datarade.ai
    Updated Jun 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TrueData (2021). TrueData First Party ID Append Data [Dataset]. https://datarade.ai/data-products/truedata-first-party-id-append-data-truedata
    Explore at:
    Dataset updated
    Jun 17, 2021
    Dataset authored and provided by
    TrueData
    Area covered
    United States of America
    Description

    Safely upload client/brand first party CRM/loyalty data. TrueData will append with relevant digital identifiers (HEM, MAID, UID 2.0, CTV IDs) and distribute to directly or via LiveRamp to any destination. Activate across Desktop, Mobile App/Web, CTV, DOOH, Audio. Ingest raw data to build derivative internal products.

  3. d

    Firmographic Data Append, B2B, USA, CCPA Compliant

    • datarade.ai
    .json, .csv
    Updated Jan 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Versium (2022). Firmographic Data Append, B2B, USA, CCPA Compliant [Dataset]. https://datarade.ai/data-products/firmographic-append-versium-reach-business-direct-versium
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Jan 6, 2022
    Dataset authored and provided by
    Versium
    Area covered
    United States
    Description

    With Versium REACH's Firmographic Append tool in the Business to Business Direct product suite you unlock the ability to append valuable firmographic data for your customer and prospect contact lists. With only a few available attributes needed you can tap into Versium's industry-leading identity resolution engine and proprietary database to append rich firmographic data. To append data you will only need any of the following: - Email - Business Domain - Business Name, Address, City, State - Business Name, Phone

  4. d

    Voter Data Append, USA, CCPA Compliant, Political Interest Data

    • datarade.ai
    .json, .csv
    Updated Dec 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Versium (2021). Voter Data Append, USA, CCPA Compliant, Political Interest Data [Dataset]. https://datarade.ai/data-products/versium-reach-political-interest-data-append-usa-gdpr-an-versium
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Dec 5, 2021
    Dataset authored and provided by
    Versium
    Area covered
    United States
    Description

    With Versium REACH Demographic Append you will have access to many different attributes for enriching your data.

    Basic, Household and Financial, Lifestyle and Interests, Political and Donor.

    Here is a list of what sorts of attributes are available for each output type listed above:

    Basic: - Senior in Household - Young Adult in Household - Small Office or Home Office - Online Purchasing Indicator
    - Language - Marital Status - Working Woman in Household - Single Parent - Online Education - Occupation - Gender - DOB (MM/YY) - Age Range - Religion - Ethnic Group - Presence of Children - Education Level - Number of Children

    Household, Financial and Auto: - Household Income - Dwelling Type - Credit Card Holder Bank - Upscale Card Holder - Estimated Net Worth - Length of Residence - Credit Rating - Home Own or Rent - Home Value - Home Year Built - Number of Credit Lines - Auto Year - Auto Make - Auto Model - Home Purchase Date - Refinance Date - Refinance Amount - Loan to Value - Refinance Loan Type - Home Purchase Price - Mortgage Purchase Amount - Mortgage Purchase Loan Type - Mortgage Purchase Date - 2nd Most Recent Mortgage Amount - 2nd Most Recent Mortgage Loan Type - 2nd Most Recent Mortgage Date - 2nd Most Recent Mortgage Interest Rate Type - Refinance Rate Type - Mortgage Purchase Interest Rate Type - Home Pool

    Lifestyle and Interests: - Mail Order Buyer - Pets - Magazines - Reading
    - Current Affairs and Politics
    - Dieting and Weight Loss - Travel - Music - Consumer Electronics - Arts
    - Antiques - Home Improvement - Gardening - Cooking - Exercise
    - Sports - Outdoors - Womens Apparel
    - Mens Apparel - Investing - Health and Beauty - Decorating and Furnishing

    Political and Donor: - Donor Environmental - Donor Animal Welfare - Donor Arts and Culture - Donor Childrens Causes - Donor Environmental or Wildlife - Donor Health - Donor International Aid - Donor Political - Donor Conservative Politics - Donor Liberal Politics - Donor Religious - Donor Veterans - Donor Unspecified - Donor Community - Party Affiliation

  5. a

    GRM append here

    • hub.arcgis.com
    Updated Dec 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tippecanoe County Assessor Hub Community (2021). GRM append here [Dataset]. https://hub.arcgis.com/datasets/8f108ad456e048718839baf71a599609
    Explore at:
    Dataset updated
    Dec 1, 2021
    Dataset authored and provided by
    Tippecanoe County Assessor Hub Community
    Area covered
    Description

    This table feeds multiple apps for Tippecanoe County. Columns are the bare minimum details for generating a GRM. When new GRM data points are collected they should be appended here.

  6. d

    Demographic Data Append (Age, Gender, Marital Status, etc) Append API, USA,...

    • datarade.ai
    .json, .csv
    Updated Mar 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Versium (2023). Demographic Data Append (Age, Gender, Marital Status, etc) Append API, USA, CCPA Compliant [Dataset]. https://datarade.ai/data-products/versium-reach-consumer-basic-demographic-age-gender-mari-versium
    Explore at:
    .json, .csvAvailable download formats
    Dataset updated
    Mar 16, 2023
    Dataset authored and provided by
    Versium
    Area covered
    United States
    Description

    With Versium REACH Demographic Append you will have access to many different attributes for enriching your data.

    Basic, Household and Financial, Lifestyle and Interests, Political and Donor.

    Here is a list of what sorts of attributes are available for each output type listed above:

    Basic: - Senior in Household - Young Adult in Household - Small Office or Home Office - Online Purchasing Indicator
    - Language - Marital Status - Working Woman in Household - Single Parent - Online Education - Occupation - Gender - DOB (MM/YY) - Age Range - Religion - Ethnic Group - Presence of Children - Education Level - Number of Children

    Household, Financial and Auto: - Household Income - Dwelling Type - Credit Card Holder Bank - Upscale Card Holder - Estimated Net Worth - Length of Residence - Credit Rating - Home Own or Rent - Home Value - Home Year Built - Number of Credit Lines - Auto Year - Auto Make - Auto Model - Home Purchase Date - Refinance Date - Refinance Amount - Loan to Value - Refinance Loan Type - Home Purchase Price - Mortgage Purchase Amount - Mortgage Purchase Loan Type - Mortgage Purchase Date - 2nd Most Recent Mortgage Amount - 2nd Most Recent Mortgage Loan Type - 2nd Most Recent Mortgage Date - 2nd Most Recent Mortgage Interest Rate Type - Refinance Rate Type - Mortgage Purchase Interest Rate Type - Home Pool

    Lifestyle and Interests: - Mail Order Buyer - Pets - Magazines - Reading
    - Current Affairs and Politics
    - Dieting and Weight Loss - Travel - Music - Consumer Electronics - Arts
    - Antiques - Home Improvement - Gardening - Cooking - Exercise
    - Sports - Outdoors - Womens Apparel
    - Mens Apparel - Investing - Health and Beauty - Decorating and Furnishing

    Political and Donor: - Donor Environmental - Donor Animal Welfare - Donor Arts and Culture - Donor Childrens Causes - Donor Environmental or Wildlife - Donor Health - Donor International Aid - Donor Political - Donor Conservative Politics - Donor Liberal Politics - Donor Religious - Donor Veterans - Donor Unspecified - Donor Community - Party Affiliation

  7. Augmented_health_Heart_Rate

    • kaggle.com
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaukat hussain (2022). Augmented_health_Heart_Rate [Dataset]. https://www.kaggle.com/shaukathussain/augmented-health-heart-rate/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 13, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shaukat hussain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Motivation

    The real motivation behind creating this dataset was to work on a project of IOT health monitoring device

    Content

    There are columns heart rate , sysBP , diaBP, height, weight, BMI etc. these parameters are necessary for predicting heart condition

    Acknowledgements

    The height/weight tables with heart rate are taken from this website

    https://www.mymathtables.com/chart/health-wellness/height-weight-table-for-all-ages.html

    Methodology

    The following code has been used to generate the data according t research from different resources on the web: `import numpy as np import pandas as pd

    age = np.random.randint(1,70,500000) sex = np.random.randint(0,2,500000) SysBP = np.random.randint(105,147,500000) DiaBP = np.random.randint(73,120,500000) HR = np.random.randint(78,200,500000) weightKg = np.random.randint(2,120,500000) heightCm = np.random.randint(48,185,500000) BMI = weightKg / heightCm / heightCm * 10000 \data=[] for age,sex,SysBP,DiaBP,HR,weightKg,heightCm,BMI in zip(age,sex,SysBP,DiaBP,HR,weightKg,heightCm,BMI): if BMI > 40 or BMI < 10: continue elif ( age < 20): continue elif ( weightKg < 45): continue elif (1 <= age <= 10) & (17 < BMI < 31) & (104< SysBP <121) & ( 73 < DiaBP < 81) & ( 99 < HR <= 200) & ( 3 < weightKg <= 36) & ( 48 < heightCm <= 139) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (10 < age <= 20) & (17 < BMI < 31) & (104< SysBP <121) & ( 73 < DiaBP <= 81) & ( 99 < HR <= 200) & ( 36 < weightKg < 60) & ( 139 < heightCm < 170) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (20 < age <= 30) & (17 < BMI < 31) & (108< SysBP <=134) & ( 75 <= DiaBP <= 84) & ( 94 < HR <= 190) & ( 28 < weightKg < 80) & ( 137 <= heightCm <= 180) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (30 < age <= 40) & (17 < BMI < 31) & (110< SysBP <=135) & ( 81 <= DiaBP <= 86) & ( 93 <= HR <= 180) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (40 < age <= 50) & (17 < BMI < 31) & (112< SysBP <=140) & ( 79 <= DiaBP <= 89) & ( 90 <= HR <= 170) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif (50 < age <= 90) & (17 < BMI < 31) & (116< SysBP <=147) & ( 81 <= DiaBP <= 91) & ( 85 <= HR <= 160) & ( 50 < weightKg < 90) & ( 137 <= heightCm <= 213) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) elif ( 20 <= age < 90) & (17 < BMI < 31) : data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),0]))) else: data.append(dict(zip(['age','sex', 'SysBP', 'DiaBP', 'HR', 'weightKg','heightCm', 'BMI','indication'], [age,sex,SysBP,DiaBP,HR,weightKg,heightCm,np.round(BMI),1]))) df1 = pd.DataFrame(data) df1.to_csv("Health_heart_experimental.csv") `

  8. Z

    Simulation data and code for "Optimal Rejection-Free Path Sampling"

    • data-staging.niaid.nih.gov
    • zenodo.org
    Updated Mar 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lazzeri, Gianmarco (2025). Simulation data and code for "Optimal Rejection-Free Path Sampling" [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14922167
    Explore at:
    Dataset updated
    Mar 25, 2025
    Dataset provided by
    Goethe University Frankfurt
    Authors
    Lazzeri, Gianmarco
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the main data of the paper "Optimal Rejection-Free Path Sampling," and the source code for generating/appending the independent RFPS-AIMMD and AIMMD runs.

    Due to size constraints, the data has been split into separate repositories. The following repositories contain the trajectory files generated by the runs:

    all the WQ runs: 10.5281/zenodo.14830317chignolin, fps0: 10.5281/zenodo.14826023chignolin, fps1: 10.5281/zenodo.14830200chignolin, fps2: 10.5281/zenodo.14830224chignolin, tps0: 10.5281/zenodo.14830251chignolin, tps1: 10.5281/zenodo.14830270chignolin, tps2: 10.5281/zenodo.14830280

    The trajectory files are not required for running the main analysis, as all necessary information for machine learning and path reweighting is contained in the "PatEnsemble" object files stored in this repository. However, these trajectories are essential for projecting the path ensemble estimate onto an arbitrary set of collective variables.

    To reconstruct the full dataset, please merge all the data folders you find in the supplemental repositories.

    Data structure and content

    analysis (code for analyzing the data and generating the figures of the| paper)|- figures.ipynb (Jupyter notebook for the analysis)|- figures (the figures created by the Jupyter notebook) |- ...

    data (all the AIMMD and reference runs, plus general info about the| simulated systems)|- chignolin |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm; see the "src" folder below) |- run.gro (full system positions in the native conformation) |- mol.pdb (only the peptide positions in the native conformation) |- topol.top (the system's topology for the GROMACS MD engine) |- charmmm22star.ff (force field parameter files) |- run.mdp (GROMACS MD parameters when appending a simulation) |- randomvelocities.mdp (GROMACS MD parameters when initializing a | simulation with random velocities) |- signature.npy, r0.npy (parameters for defining the fraction of native | contacts involved in the folded/unfolded states | definition; used by params.py function | "states_function") |- dmax.npy, dmin.npy (parameters for defining the feature representation | of the AIMMD NN model; used by params.py | function "descriptors_function") |- equilibrium (reference long equilibrium trajectory files; only the | peptide positions are saved!) |- run0.xtc, ..., run3.xtc |- validation |- validation.xtc (the validation SPs all together in an XTC file) |- validation.npy (for each SP, collects the cumulative shooting results after 10 two-way shooting simulations) |- fps0 (the first AIMMD-RFPS independent run) |- equilibriumA (the free simulations around A, already processed | in PathEnsemble files) |- traj000001.h5 |- traj000001.tpr (for running the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000001.cpt (for appending the simulation; in that case, please | retrieve all the trajectory files in the right | supplemental repository first) |- traj000002.h5 (in case of re-initialization) |- ... |- equilibriumB (the free simulations around B, ...) |- ... |- shots0 |- chain.h5 (the path sampling chain) |- pool.h5 (the selection pool, containing the frames from which | shooting points are currently selected from) |- params.py (file containing the states and descriptors definitions, | the NN fit function, and the AIMMD runs hyperparameters; | if can be modified to allow for RFPS-AIMMD or the original | algorithm AIMMD runs) |- initial.trr (the initial transition for path sampling) |- manager.log (reports info about the run) |- network.h5 (NN weights of the model at different path | sampling steps) |- fps1, fps2 (the other RFPS-AIMMD runs) |- tps0 (the first AIMMD-TPS, or "standard" AIMMD, run) |- ... |- shots0 |- ... |- chain_weights.npy (weights of the trials in TPS; only the trials | with non zero weight had been accepted) |- tps1, tps2 (the other AIMMD runs, with TPS for the shooting simulations)|- wq (Wolfe-Quapp 2D system) |- *.py: (code for generating/appending AIMMD runs on a Workstation or | HPC cluster via Slurm) |- run.gro (dummy gro file produced for compatibility reasons) |- integrator.py (custom MD engine) |- equilibrium (reference long simulation) |- transition000001.xtc (extracted from reference long simulation) |- transition000002.xtc |- ... |- transitions.h5 (PathEnsemble file with all the transitions) |- reference |- grid_X.npy, grid_Y.npy (X, Y grid for 2D plots) |- grid_V.npy (PES projected on the grid) |- grid_committor_relaxation.npy (true committor on the grid solved | with the relaxation method on the | backward Kolmogorov equation; the | code for doing this is in utils.py) |- grid_boltzmann_distribution.npy (Boltzmann distribution on the grid) |- pe.h5 (equilibrium distribution processed as a PathEnsemble file) |- tpe.h5 (TPE distribution processed as a PathEnsemble file) |- ... |- uniform_tps (reference TPS run with uniform SP selection) |- chain.h5 (PathEnsemble file containin all the accepted paths | with their correct weight) |- fps0, ..., fps9 (the independent AIMMD-RFPS runs) |- ... |- tps0, ..., tps9 (the independent AIMMD-TPS, or "standard" AIMMD runs)

    src (code for generating/appending AIMMD runs on a Workstation or HPC| cluster via Slurm)|- generate.py (on a Workstation: initializes the processes; on an HPC| cluster: creates the sh file for submitting a job)|- slurm_options.py (to customize and use in case of running on HPC)|- manager.py (controls SP selection; reweights the paths)|- shooter.py (performs path sampling simulations)|- equilibrium.py (performs free simulations)|- pathensemble.py (code of the PathEnsemble class)|- utils.py (auxiliary functions for data production and analysis)

    Running/appending AIMMD runs

    • To initialize a new RFPS-AIMMD (or AIMMD) run for the systems of this paper:
    1. Create a "run directory" folder (same depth as "fps0")

    2. Copy "initial.trr" and "params.py" from another AIMMD run folder. It is possible to change "params.py" to customize the run.

    3. (On a Workstation) call:

    python generate.py

    where nsteps is the final number of path sampling steps for the run, n the number of independent path sampling chains, nA the number of independent free simulators around A, and nB that of free simulators around B.

    1. (On a HPC cluster) call:

    python generate.py -s slurm_options.pysbatch ._job.sh

    • To append to an existing RFPS-AIMMD or AIMMD run
    1. Merge the supplemental repository with the trajectory files into this one.

    2. Just call again (on a Workstation)

    python generate.py

    or (on a HPC cluster)

    sbatch ._job.sh

    after updating the "nsteps" parameters.

    • To run enhanced sampling for a new system: please keep the data structure as close as possible to the original. Different names for the files can generate incompatibilities. We are currently trying to make it easier.

    Reproducing the analysis

    Run the analysis/figures.ipynb notebook. Some groups of cells have to be run multiple times after changing the parameters in the preamble.

  9. leap-val-data-f64

    • kaggle.com
    zip
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bilzard (2024). leap-val-data-f64 [Dataset]. https://www.kaggle.com/tatamikenn/leap-val-data-f64
    Explore at:
    zip(8800477931 bytes)Available download formats
    Dataset updated
    Jun 13, 2024
    Authors
    Bilzard
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Validation Data for Stacking - 8th Year Validation Set (1/6 Subsample)

    The sample_id is created sequentially from the 1st year (hence, it is different from the sample_id in the Kaggle Dataset). Note that while the original data follows a naming convention with 'train_...', this dataset simply uses integer IDs.

    The data is divided into 12 chunks, each containing data from 8th year February to 9th year January.

    Period

    • 0008-02 to 0009-01

    Sub-sampling Method

    1. Sub-sample to 1/6 of all samples, ignoring leap day (2/29) in leap years (offset=0).

    Source code:

    from pathlib import Path
    
    import click
    import pandas as pd
    import polars as pl
    
    
    @click.command()
    @click.argument("subsample-rate")
    @click.argument("offset")
    def main(subsample_rate, offset):
      NUM_YEARS = 8
      MONTH_DAY = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]
      NUM_SAMPLES = (8 * sum(MONTH_DAY) * 72) // subsample_rate
      assert (
        0 <= offset < subsample_rate
      ), f"assertion failed: 0 <= offset < subsample_rate, got {offset} and {subsample_rate}."
    
      idx = 0
      file_id = 0
      data = []
      try:
        for year in range(NUM_YEARS):
          for month in range(1, 13):
            for day in range(MONTH_DAY[month % 12]):
              for term in range(72):
                if file_id == NUM_SAMPLES:
                  raise Exception
                if idx % subsample_rate == offset:
                  data.append(
                    dict(
                      sample_id=file_id,
                      year=year + 1,
                      real_year=year + (month // 12) + 1,
                      month=month % 12 + 1,
                      day=day + 1,
                      min_of_day=term * 1200,
                    )
                  )
                  file_id += 1
                idx += 1
      except Exception:
        print("error")
        pass
    
      output_path = Path(
        "/ml-docker/working/kaggle-leap-private/data/hugging_face_download"
      )
      if not output_path.exists():
        output_path.mkdir()
    
      df = pl.from_pandas(pd.DataFrame(data))
      print(df.filter(pl.col("year").eq(8)))
      df.write_parquet(output_path / f"subsample_s{subsample_rate}_o{offset}.pqt")
    
    
    if _name_ == "_main_":
      main()
    

    Positioning of This Data

    Low-Resolution Real Geography
    
      11.5° x 11.5° horizontal resolution (384 grid columns)
      100 million total samples (744 GB)
      1.9 MB per input file, 1.1 MB per output file
    
    1. Low resolution (774 GB) -> I refer to this as the 1/1 full set.
    2. Kaggle Dataset -> A 1/7 subsample of 1, containing data from 1st to 7th year (excluding the 8th year).
    3. leap-val-data-f64 (this dataset) -> A 1/6 subsample of 1, containing only the 8th year data.

    Therefore, it is appropriate to evaluate the model trained on the Kaggle Dataset with this dataset.

  10. CT data

    • kaggle.com
    zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NG NM WT (2023). CT data [Dataset]. https://www.kaggle.com/datasets/ngnmwt/ct-data/code
    Explore at:
    zip(22039091 bytes)Available download formats
    Dataset updated
    Jul 3, 2023
    Authors
    NG NM WT
    Area covered
    Connecticut
    Description

































































    拡張型 https://thomasnyberg.com/cpp_extension_modules.html

    https://pypi.org/project/ct-python/

    # coding: utf-8 import numpy as np from scipy.sparse import lil_matrix,csc_matrix from scipy.sparse.linalg import lsqr from PIL import Image import math def gen_col_index(NX,NY,angle,step): L = int(max(NX,NY)*1.42) nx = math.cos(angle*math.pi/180) ny = math.sin(angle*math.pi/180) x0 = NX/2 + nx * step y0 = NY/2 + ny * step c = -nx*x0 - ny*y0 cols=[ ] if abs(nx)>abs(ny): for y in range(NY): x1 = int(-(ny*y+c)/nx+0.5) for x in range(x1-5,x1+6): if x<0 or x>=NX: continue dist = abs(nx*x+ny*y+c) if dist<5: wt = math.exp(-dist**2/3.0) k = x + y*NX cols.append([k,wt]) else: for x in range(NX): y1 = int(-(nx*x+c)/ny + 0.5) for y in range(y1-5,y1+6): if y<0 or y>=NY: continue dist = abs(nx*x+ny*y+c) if dist<5: wt = math.exp(-dist**2/3.0) k = x + y*NX cols.append([k,wt]) return cols def gen_matrix(NX,NY,data): N = data.shape[0] M = NX*NY W = lil_matrix((N,M)) F = np.zeros((N,)) for j,x in enumerate(data): angle = x[0] step = x[1] val = x[2] cols = gen_col_index(NX,NY,angle,step) for k,wt in cols: W[j,k]=wt F[j]=val return W,F ### data = np.load('scanned_data.npy') NX=128 NY=128 W,F = gen_matrix(NX,NY,data) W = csc_matrix(W) res = lsqr(W,F,show=True,iter_lim=100) X = res[0] image = np.zeros((NX,NY)) for x in range(NX): for y in range(NY): val = X[x + y*NX] image[x,y] = val pil_img = Image.fromarray(image.astype(np.uint8)) pil_img.show() pil_img.save('reconstructed_image.png')
    
    # coding: utf-8 import numpy as np from PIL import Image import math def calc_signal(img, angle, step): NX = img.shape[0] NY = img.shape[1] x_c = NX/2 y_c = NY/2 nx = math.cos(angle*math.pi/180) ny = math.sin(angle*math.pi/180) x0 = x_c + nx * step y0 = y_c + ny * step c = -nx*x0 - ny*y0 sig=0 for x in range(NX): for y in range(NY): dist = abs(nx*x + ny*y + c) if dist<5: wt = math.exp(-dist*2/3.0) sig = sig + wt*img[x,y] return sig def scan_image(img): data=[] L = int(np.max(img.shape)/2*math.sqrt(2)+1) for angle in np.arange(0,180,5): # step angle of 5 degrees print("angle=",angle) for step in range(-L,L,1): sig = calc_signal(img, angle, step) if sig>0: data.append([angle,step,sig]) return np.array(data) # img = np.array(Image.open('lena.png')) data = scan_image(img) np.save('scanned_data.npy',data)
    
    

    pip install ct-python

    pip install git+https://github.com/configtree/ct-python-sdk

    import ct_python

    python setup.py install --user

    import ct_python

    configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Version() # Version | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.partial_update_version(body, id, organization_slug) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->partial_update_version: %s " % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.TokenRefresh() # TokenRefresh | try: api_response = api_instance.refresh_token(body) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->refresh_token: %s " % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Application() # Application | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.update_application(body, id, organization_slug) pprint(api_response) except ApiException as e: print("Exception when calling CTApi->update_application: %s " % e) # Configure API key authorization: Bearer configuration = ct_python.Configuration() configuration.api_key['Authorization'] = 'YOUR_API_KEY' configuration.api_key_prefix['Authorization'] = 'Bearer' # create an instance of the API class api_instance = ct_python.CTApi(ct_python.ApiClient(configuration)) body = ct_python.Configuration() # Configuration | id = 'id_example' # str | organization_slug = 'organization_slug_example' # str | try: api_response = api_instance.update_configurat...

  11. Legislation API - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Jul 29, 2010
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2010). Legislation API - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/legislation-api
    Explore at:
    Dataset updated
    Jul 29, 2010
    Dataset provided by
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    API for www.legislation.gov.uk - launched by The National Archives on 29/07/2010 - giving access to the statute book at various levels, for various times, as reusable html fragments, xml and rdf. The API is RESTful and uses content negotiation, so full access to the data can be achieved using http requests. Alternatively, just append data.xml or data.rdf to any legislation page on the website to return the underlying data. The full API is also available from http://legislation.data.gov.uk.

  12. Mayo Tile Images

    • kaggle.com
    zip
    Updated Sep 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    tmyok (2022). Mayo Tile Images [Dataset]. https://www.kaggle.com/datasets/tmyok1984/mayo-jpg-dataset-1024
    Explore at:
    zip(40792362631 bytes)Available download formats
    Dataset updated
    Sep 4, 2022
    Authors
    tmyok
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.kaggle.com/code/tmyok1984/mayo-tile-generation-using-pyvips

    def make_tiles(image_path, tile_size=1024, max_tiles=64, avg_thr=230, std_thr=15, clip_edge=0.05):
    
      image = pyvips.Image.new_from_file(image_path, access='sequential')
    
      # cropping
      offset_x = int(image.width * clip_edge)
      offset_y = int(image.height * clip_edge)
      w = int(image.width * (1-clip_edge*2))
      h = int(image.height * (1-clip_edge*2))
      image = image.crop(offset_x, offset_y, w, h)
    
      # padding
      pad_w = (tile_size - image.width%tile_size)%tile_size
      pad_h = (tile_size - image.height%tile_size)%tile_size
      image = image.embed(
        pad_w//2, pad_h//2,
        image.width+pad_w, image.height+pad_h,
        extend="mirror")
    
      # Get the scanning position of the image
      x_pos_list = []
      y_pos_list = []
      for y in range(0, image.height, tile_size):
        for x in range(0, image.width, tile_size):
          x_pos_list.append(x)
          y_pos_list.append(y)
    
      # Get the cropping position of the image
      selected_x_pos_list = []
      selected_y_pos_list = []
      avg_list = []
      for x, y in zip(x_pos_list, y_pos_list):
        tile = image.crop(x, y, tile_size, tile_size)
        avg = tile.avg()
        std = tile.deviate()
        if avg < avg_thr and std > std_thr:
          selected_x_pos_list.append(x)
          selected_y_pos_list.append(y)
          avg_list.append(avg)
    
      # Sort by ascending order of average brightness
      sorted_idx = np.argsort(np.array(avg_list))
      selected_x_pos_array = np.array(selected_x_pos_list)[sorted_idx][:max_tiles]
      selected_y_pos_array = np.array(selected_y_pos_list)[sorted_idx][:max_tiles]
    
      # crop
      images = []
      for x, y in zip(selected_x_pos_array, selected_y_pos_array):
        tile = image.crop(x, y, tile_size, tile_size)
        img = tile.numpy()
        images.append(img)
    
      if len(images) > 0:
        images = np.stack(images)
    
      del image
      gc.collect()
    
      return images
    
  13. LeetCode CN Problems

    • kaggle.com
    zip
    Updated Apr 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    imba-tjd (2024). LeetCode CN Problems [Dataset]. https://www.kaggle.com/datasets/imbatjd/leetcode-cn-problems/code
    Explore at:
    zip(530963 bytes)Available download formats
    Dataset updated
    Apr 5, 2024
    Authors
    imba-tjd
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The data was collected on 2024-04-05 containing 3492 problems.

    Cleaned via the following script.

    import json
    import csv
    from io import TextIOWrapper
    
    
    def clean(data: dict):
      questions = data['data']['problemsetQuestionList']['questions']
      for q in questions:
        yield {
          'id': q['frontendQuestionId'],
          'difficulty': q['difficulty'],
          'title': q['title'],
          'titleCn': q['titleCn'],
          'titleSlug': q['titleSlug'],
          'paidOnly': q['paidOnly'],
          'acRate': round(q['acRate'], 3),
          'topicTags': [t['name'] for t in q['topicTags']],
        }
    
    
    def out_jsonl(f: TextIOWrapper):
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        for q in clean(data):
          f.write(json.dumps(q, ensure_ascii=False))
          f.write('
    ')
    
    
    def out_json(f: TextIOWrapper):
      l = []
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        for q in clean(data):
          l.append(q)
    
      json.dump(l, f, ensure_ascii=False)
    
    
    def out_csv(f: TextIOWrapper):
      writer = csv.DictWriter(f, fieldnames=[
        'id', 'difficulty', 'title', 'titleCn', 'titleSlug', 'paidOnly', 'acRate', 'topicTags'
      ])
      writer.writeheader()
    
      for id in range(0, 35):
        with open(f'data/{id}.json', encoding='u8') as f2:
          data = json.load(f2)
    
        writer.writerows(clean(data))
    
    
    with open('data.jsonl', 'w', encoding='u8') as f:
      out_jsonl(f)
    
    with open('data.json', 'w', encoding='u8') as f:
      out_json(f)
    
    with open('data.csv', 'w', encoding='u8', newline='') as f:
      out_csv(f)
    
  14. G

    Data Enrichment Platform Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Sep 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Data Enrichment Platform Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/data-enrichment-platform-market
    Explore at:
    csv, pptx, pdfAvailable download formats
    Dataset updated
    Sep 1, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Data Enrichment Platform Market Outlook




    According to our latest research, the global Data Enrichment Platform market size reached USD 2.47 billion in 2024, reflecting robust adoption across multiple industries. The market is projected to grow at a CAGR of 14.2% from 2025 to 2033, with the total market value expected to reach USD 7.72 billion by 2033. This remarkable growth is fueled by the increasing demand for high-quality, actionable data to drive decision-making, enhance customer engagement, and support digital transformation initiatives across sectors.




    A primary growth driver for the Data Enrichment Platform market is the exponential rise in data volumes generated by businesses worldwide. Organizations are increasingly recognizing the importance of transforming raw data into valuable insights, which is only possible through advanced data enrichment solutions. These platforms enable companies to append, cleanse, and validate their datasets, ensuring high data accuracy and relevancy. The proliferation of digital channels, IoT devices, and cloud-based applications has further intensified the need for real-time data enrichment, as enterprises strive to personalize customer experiences and optimize operational efficiency. Additionally, the rapid adoption of artificial intelligence and machine learning technologies within data enrichment platforms has significantly improved the speed and accuracy of data processing, making these solutions indispensable for modern enterprises.




    Another significant factor propelling market growth is the rising focus on regulatory compliance and risk mitigation. With stringent data privacy regulations such as GDPR, CCPA, and others coming into effect, organizations must ensure that their data repositories are accurate, up-to-date, and compliant. Data enrichment platforms help businesses identify outdated or incorrect information, reduce compliance risks, and maintain robust audit trails. This capability is especially crucial for sectors such as BFSI, healthcare, and government, where data integrity and compliance are paramount. The integration of enrichment solutions with existing CRM, ERP, and marketing automation systems has further expanded their applications, making it easier for organizations to maintain clean and compliant datasets across all functions.




    The evolving landscape of customer engagement and marketing strategies is also fueling demand for data enrichment platforms. Businesses are increasingly leveraging enriched data to gain a 360-degree view of their customers, segment audiences more effectively, and deliver hyper-personalized content. Enhanced data quality empowers sales and marketing teams to target prospects with precision, improve lead scoring, and drive higher conversion rates. Moreover, in highly competitive sectors like retail and e-commerce, enriched data supports dynamic pricing, inventory management, and customer retention initiatives. As digital transformation accelerates across industries, the ability to derive actionable insights from enriched data is becoming a key differentiator for businesses seeking to gain a competitive edge.




    From a regional perspective, North America continues to dominate the Data Enrichment Platform market, accounting for the largest share in 2024. This leadership is attributed to the presence of major technology providers, high adoption rates of advanced analytics solutions, and a mature digital infrastructure. Europe follows closely, driven by stringent data privacy regulations and a strong focus on data-driven decision-making. The Asia Pacific region is emerging as a high-growth market, supported by rapid digitalization, expanding e-commerce sectors, and increasing investments in cloud and AI technologies. Latin America and the Middle East & Africa are also witnessing steady growth, albeit from a smaller base, as organizations in these regions ramp up their digital transformation efforts.



    Merchant Data Enrichment is becoming a pivotal aspect of the data enrichment landscape, especially as businesses seek to enhance their understanding of transaction data. By leveraging merchant data enrichment, organizations can gain deeper insights into consumer spending patterns and merchant behaviors, which are critical for tailoring marketing strategies and improving customer engagement. This process involves appending

  15. H

    Replication Data for: What do cross-country surveys tell us about social...

    • dataverse.harvard.edu
    • dataone.org
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Tannenbaum; Alain Cohn; Christian L. Zünd; Michel A. Maréchal (2022). Replication Data for: What do cross-country surveys tell us about social capital? [Dataset]. http://doi.org/10.7910/DVN/NDDWHJ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    David Tannenbaum; Alain Cohn; Christian L. Zünd; Michel A. Maréchal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Code and data to reproduce all results and graphs reported in Tannenbaum et al. (2022). This folder contains data files (.dta files) and a Stata do-file (code.do) that stitches together the different data files and executes all analyses and produces all figures reported in the paper. The do-file uses a number of user-written packages, which are listed below. Most of these can be installed using the ssc install command in Stata. Also, users will need to change the current directory path (at the start of the do-file) before executing the code. List of user written packages (descriptions): revrs (reverse-codes variable) ereplace (extends the egen command to permit replacing) grstyle (changes the settings for the overall look of graphs) spmap (used for graphing spatial data) qqvalue (used for obtaining Benjamini-Hochberg corrected p-values) parmby (creates a dataset by calling an estimation command for each by-group) domin (used to perform dominance analyses) coefplot (used for creating coefficient plots) grc1leg (combine graphs with a single common legend) xframeappend (append data frames to the end of the current data frame)

  16. 2024-home-credit-public-repo

    • kaggle.com
    zip
    Updated Feb 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergey Saharovskiy (2024). 2024-home-credit-public-repo [Dataset]. https://www.kaggle.com/datasets/sergiosaharovskiy/2024-home-credit-public-repo
    Explore at:
    zip(1982331 bytes)Available download formats
    Dataset updated
    Feb 6, 2024
    Authors
    Sergey Saharovskiy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Welcome

    It is Sergey's Home Credict Public Notebook code repo.

    Train files Null count

    The calculation was obtained by using the below snippet:

    shapes, nan_total_count = [], []
    for fp in tqdm(train_disk_usage.path):
      df = pl.read_csv(fp) 
      shapes.append(df.shape)
      nan_total_count.append(df.null_count().to_pandas().sum().sum())
      del df
      
    train_disk_usage[['height', 'width']] = shapes
    train_disk_usage['null_count'] = nan_total_count
    train_disk_usage['isna_%'] = train_disk_usage.null_count / np.prod(shapes, 1) 
    train_disk_usage.to_csv('data/train_disk_usage.csv', index=False)
    
  17. v

    Enhancing the Analytic Capacity of NCT00364351 using a Statistical Linkage...

    • search.vivli.org
    Updated 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Project Data Sphere (2020). Enhancing the Analytic Capacity of NCT00364351 using a Statistical Linkage Method to Append Socioeconomic and Health Care Access Variables from the Medical Expenditure Panel Survey [Dataset]. http://doi.org/10.25934/00005378
    Explore at:
    Dataset updated
    2020
    Dataset provided by
    datacite
    Vivli
    Authors
    Project Data Sphere
    Description

    Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform to serve all elements of the international research community. Our mission is to promote, coordinate, and facilitate scientific sharing and reuse of clinical research data through the creation and implementation of a sustainable global data-sharing enterprise. The Vivli platform includes an independent data repository, in-depth search engine and a cloud-based, secure analytics platform.

  18. v

    Enhancing the Analytic Capacity of NCT00048230 using a Statistical Linkage...

    • search.vivli.org
    Updated 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Project Data Sphere (2020). Enhancing the Analytic Capacity of NCT00048230 using a Statistical Linkage Method to Append Socioeconomic and Health Care Access Variables from the Medical Expenditure Panel Survey [Dataset]. http://doi.org/10.25934/00005376
    Explore at:
    Dataset updated
    2020
    Dataset provided by
    datacite
    Vivli
    Authors
    Project Data Sphere
    Description

    Vivli is an independent, non-profit organization that has developed a global data-sharing and analytics platform to serve all elements of the international research community. Our mission is to promote, coordinate, and facilitate scientific sharing and reuse of clinical research data through the creation and implementation of a sustainable global data-sharing enterprise. The Vivli platform includes an independent data repository, in-depth search engine and a cloud-based, secure analytics platform.

  19. h

    hh-rlhf-helpful-only

    • huggingface.co
    Updated Jan 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caden Juang (2025). hh-rlhf-helpful-only [Dataset]. https://huggingface.co/datasets/kh4dien/hh-rlhf-helpful-only
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 16, 2025
    Authors
    Caden Juang
    Description

    from datasets import Dataset, DatasetDict from collections import defaultdict import re import random import json

    data = defaultdict(list)

    paths = { "test_online": "./test_online.jsonl", "train_online": "./train_online.jsonl", "test_rejection": "./test_rejection.jsonl", "train_rejection": "./train_rejection.jsonl", }

    for name, path in paths.items(): with open(path, "r") as f: for line in f: data[name].append(json.loads(line))

    def split_data(text):… See the full description on the dataset page: https://huggingface.co/datasets/kh4dien/hh-rlhf-helpful-only.

  20. g

    Statistical Computing: SPSS

    • search.gesis.org
    • dataverse.unc.edu
    • +1more
    Updated Oct 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zimmer, Catherine (2021). Statistical Computing: SPSS [Dataset]. https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631
    Explore at:
    Dataset updated
    Oct 29, 2021
    Dataset provided by
    UNC Dataverse
    GESIS search
    Authors
    Zimmer, Catherine
    License

    https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631https://search.gesis.org/research_data/datasearch-httpsdataverse-unc-eduoai--hdl1902-2911631

    Description

    Part 1 of the course will offer an introduction to SPSS and teach how to work with data saved in SPSS format. Part 2 will demonstrate how to work with SPSS syntax, how to create your own SPSS data files, and how to convert data in other formats to SPSS. Part 3 will teach how to append and merge SPSS files, demonstrate basic analytical procedures, and show how to work with SPSS graphics.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Zahra Zolghadr (2024). Append Data [Dataset]. https://www.kaggle.com/datasets/zahrazolghadr/append-data
Organization logo

Append Data

Explore at:
zip(52816 bytes)Available download formats
Dataset updated
Apr 6, 2024
Authors
Zahra Zolghadr
Description

Dataset

This dataset was created by Zahra Zolghadr

Contents

Search
Clear search
Close search
Google apps
Main menu