Python pyaudio UDP streamming

1월 06, 2023

Recently I worked on a project to duplicate RTP audio data and redirects them to other application. To do this project, I needed to create a sample program that could listen to the final audio data, so I decided to simply use Python.

I will make a server and client that sends audio, and the client will use pyaudio, a package that is used a lot for audio processing in Python.

The audio file to be used for testing is an uncompressed audio file without a header and is 8000, 16 bit, PCM in mono format.

server sending voice data

The server program we will use for testing is very simple. The pcm file will be opened and transmitted in units of 320 bytes. The reason for sending in units of 320 bytes is as follows.

An 8000, 16-bit, mono format pcm file has 16000 bytes per second.

In the VoIP world, it is common to use 50 per second (ptime 20) for RTP transmissions. Therefore, the size of a voice packet in units of 20 ms is 320 bytes.

import socket
import time
import threading


HOST = "127.0.0.1"
PORT = 5005

data = bytes() # Stream of audio bytes 

BROADCAST_SIZE = 320

AUDIO_FILE = "fourseason.pcm"  #8000, 16BIT, MONO FORMAT

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

with open(AUDIO_FILE, "rb") as f:   # 8000, 16bit , mono
    while True:
        s = time.time()
        buf = f.read(BROADCAST_SIZE)
        if 0 == len(buf) :
            break
        sock.sendto(buf, (HOST, int(PORT)))
        sleep_tm = 0.02 - (time.time() - s)
        print("snd audio %d sleep[%f]"%(len(buf), sleep_tm))
        # time.sleep(0.8 * 0.02)
        time.sleep(0.01)
sock.close()

<audio_server.py>

One point to watch out for is the pause between packet transmissions. In the code above, we paused for 0.01 seconds (actually it would be slightly more than 0.01). This value is significantly faster than the processing time of 320 bytes of PCM data. Therefore, you will receive more value than real-time processing data from the client.

client receiving voice data

The client program consists of a part that receives and buffers voice packets and a part that plays using pyaudio, and uses threading for parallel processing.

import pyaudio
import time
import sys
import socket
import threading

UDP_IP = "0.0.0.0"
UDP_PORT = 5005

g_exit = False
g_data = bytes()


p = pyaudio.PyAudio()
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind((UDP_IP, UDP_PORT))
sock.settimeout(10.0)
lock = threading.Lock()

s = time.time()

def callback(in_data, frame_count, time_info, status):
    global s, g_data
    buf_size = frame_count * 2  # 16bit sound, Sso multiply by 2
    if len(g_data) > buf_size:
        lock.acquire()
        data = g_data[:buf_size]
        g_data = g_data[buf_size:]
        lock.release()
        e = time.time()
        print("play [%d] remain[%d] callback time:%f"%(len(data), len(g_data), e - s))
        s = e
        return (data, pyaudio.paContinue)
    else:
        print("play end [%d]"%(len(g_data)))
        return (None, pyaudio.paComplete)    
        #return (None, pyaudio.paOutputUnderflow)

def sock_recv():
    global g_data
    while not g_exit:
        try:
            data, _ = sock.recvfrom(2048) 
            if len(data):
                lock.acquire()
                g_data += data
                lock.release()
                print("rcv audio %d"%(len(data)))

        except socket.timeout:
            print("time out")
            break
        except KeyboardInterrupt:
            print("Ctrl+C")
            break

def buffering(sec):
    while True:
        if len(g_data) > 320 * 50 * sec: # 320X50 =>1sec audio
            break
        print("udp rcv[%d] wait..."%(len(g_data)))
        time.sleep(0.01)    


t = threading.Thread(target=sock_recv)
t.start()

#You should buffer before make pyaudio stream
buffering(1)

stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=8000,
                output=True,
                stream_callback=callback)

stream.start_stream()
t.join()

print("Play End")

stream.stop_stream()
stream.close()
p.terminate()

<nonblock_player.py>

The client program has the following characteristics.

For audio voice play, pyaudio's non-blocking method was used. The non-block method is suitable for playing regardless of the data speed transmitted from the server.
The source code part that receives voice packets is made into an independent thread so that packets can be received stably.
The client starts pyaudio after buffering 1 second of audio data in advance.
If no packets come during the timeout period (5 seconds) after operation, the client determines that play is over and terminates the program.

The source code can be downloaded from https://github.com/raspberry-pi-maker/C-Python-Cooking.

이 블로그 검색

C/C++ Python Cooking

Python pyaudio UDP streamming

server sending voice data

client receiving voice data

댓글

댓글 쓰기

이 블로그의 인기 게시물

MQTT - C/C++ Client

RabbitMQ - C++ Client #1 : Installing C/C++ Libraries

C/C++ - Everything about time, date