AWS Managed Streaming for Apache Kafka (MSK)
Describes how to use Phirestream with AWS Managed Streaming for Apache Kafka (MSK).
Phirestream can be used to redact sensitive information such as personally identifiable information (PII) and protected health information (PHI) from streaming text in Amazon Managed Streaming for Apache Kafka (MSK) clusters. This guide requires you have an Apache Kafka cluster running in Amazon MSK. Refer to the AWS documentation for creating an AWS MSK cluster.

Phirestream AWS Architecture

Phirestream works as a proxy in front of Apache Kafka and Amazon MSK. Phirestream exposes a REST interface that accepts messages, redacts the sensitive information in the data, and then produces the message to the Kafka brokers.

AWS MSK Cluster Configuration

An example MSK cluster configuration is shown below:
auto.create.topics.enable=true default.replication.factor=2 min.insync.replicas=2 num.io.threads=8 num.network.threads=5 num.partitions=1 num.replica.fetchers=2 replica.lag.time.max.ms=30000 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 socket.send.buffer.bytes=102400 unclean.leader.election.enable=true zookeeper.session.timeout.ms=18000

AWS MSK Security Group

The following are example security group rules to allow communication with the brokers using TLS. Customize these rules per your VPC and subnet settings. See the AWS MSK documentation for other ports.
Custom TCP TCP 9094 10.0.0.0/16 Brokers and consumers TLS Custom TCP TCP 2181 10.0.0.0/16 ZooKeeper

Phirestream Settings

Edit the /opt/phirestream/config/application.properties file to set the addresses of the MSK cluster:
kafka.security.protocol=SSL kafka.bootstrap.servers=[msk-broker-addresses]
As an example:
kafka.security.protocol=SSL kafka.bootstrap.servers=b-3.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094,b-1.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094,b-2.phirestream.xqole6.c16.kafka.us-east-1.amazonaws.com:9094
Restart Phirestream for the change to take effect.
sudo systemctl restart phirestream
Phirestream is now ready to receive your text via its Kafka-compliant REST API. The redacted text will be written to the MSK cluster on the appropriate topic. See the Quick Start for text redaction examples and refer to the AWS MSK documentation for consuming the redacted text.
Last modified 16d ago