DLP template where InfoTypes and rules are specified. Output Bucket - bucket where the redacted file is stored.Working Bucket - a working bucket in which all temp files will be stored as throughout the different workflow stages.Input Bucket - bucket where the original file is stored.findings-writer - Writes findings into BigQuery.pdf-merger - Assembles back the pages into a single PDF.dlp-runner - Runs each page file through DLP to redact sensitive information.pdf-spliter - Split PDF into single-page image files.CloudRun services for each component with its service accounts and permissions.The terraform folder contains the code needed to deploy the PDF Redaction application. Write redacted quotes (findings) to BigQuery.Assemble back the PDF file from the list of redacted images and store it on GCS (output bucket).Redact each image using DLP Image Redact API.Split the PDF into single pages, convert pages into images, and store them in a working bucket.This workflow orchestrates the PDF file redaction consisting of the following steps: The user uploads a PDF file to a GCS bucket. The workflow consists of the following steps: The image below describes the solution architecture of the pdf redaction process. This solution provides an automated, serverless way to redact sensitive data from PDF files using Google Cloud Services like Data Loss Prevention (DLP), Cloud Workflows, and Cloud Run.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |