Video Streaming Update 2020 – Robotic Vision

Since my old video streaming tutorial get a lot of view I wanted to add some new info and an introduction. If you are building a robot and need to stream video this is for you!


Why android to desktop


The android device I use is an old old v21 phone. I can buy them for $5 a piece or just ask around for an old phone in someones junk drawer. I waned something cheap and available. Lots of sensors too. Compare that to a naked raspberry pi at $35 with no camera and much weaker software.


If you are using neural nets then you need some processing power. Sending video is a good option to avoid weighing down your bot.

Streaming from android

The older posts that you linked from used the basic camera api. DO NOT use the basic camera api.

The second thing I found is bigflake. Its useful but its complicated and I don’t think its that fast although it was difficult to benchmark.

Every app that uses an android is using the webrtc framework; google video, snapchat, signal etc. However webrtc has a very limiting api an is designed for video calls. However I found this library on github. It opens the api and allows you to extract video frame by frame. My remote computers start video command has a 740 ms turnaround before recieiving its first nal from the remote camera on my home network.

video rec


These two txt files are the classes that extract video from my android. I draw them on a texture but you dont have to if you want a blank screen. It takes 16ms to encode and 5 ms to packetize and send using the packetizer from the older tutorial.



FFmpegframegrabber is used to grabframes and display them.

My first hang up was installing.  You need to install the latest release as library.


final FrameGrabber grabber;


grabber = new FFmpegFrameGrabber(inputStream, 0);

Although I still cannot benchmark it properly I have got it working in “real time” . I had to download the snapshot,  change a variable by hand, build it and then paste that jar to replace the javacv.jar that came in the release files you downloaded above/

line 926  change it from 0 to 1 to eliminate the latency

// Enable multithreading when available

Heres my stack overflow question.

The older section goes into detail and the packetization is still required. The other parts are worth a read too.

Android data analysis and error handling

The featured picture for this post is actually a cnc project where my end mill broke moments before completing. But if you’re an android developer you probably have had the little gear icon in your email inbox or more recently emails from fabric.

Scroll down to the bold title to skip my rambling musings.

These are amazing tools but I ran into a problem that I couldn’t handle with these tools. I was creating a new revised method/function within my app that generates a string for the user to read.  I wanted the new method to produce the exact same result as the old method when handling old type situations, plus it needed to handle some new situations as well. After building it and debugging I found that it produced the exact same string 99.996% of the time. This variation appears to unavoidable. Even better I can use the old method 99% of the time thus making the chances that the user encounter this error non-existent.

Not foolish enough to make assumptions I deployed my new method invisible to the user and using fabric waited for my “events” to register showing a similar match rate. Once I had confidence that th match rate matched my testing I would make the switch visible to the user.

Much to my surprise the match rate was way lower. .. around 50%. I went through and made sure that everything was right. I then added a custom attribute to show me both strings. This is where my problem arose. Fabric truncates strings and custom attributes from the same event are split up in the online review area. The data was almost useless.

—  So I decided to make my own poor mans version. Here’s  how i did it. —

On the android side I used the Volley Network. Once imported I made the standard volley pattern which can be accessed statically. Volley is awesome.

public class VolleyNetwork {

    private static VolleyNetwork mInstance;
    private RequestQueue mRequestQueue;
    private static Context context;

    private VolleyNetwork(Context contex)
        context = contex;
        mRequestQueue = getRequestQueue();


    public static synchronized VolleyNetwork getmInstance(Context context){
        if (mInstance == null){
            mInstance = new VolleyNetwork(context);
        return mInstance;


    public RequestQueue getRequestQueue() {
        if (mRequestQueue == null){

            mRequestQueue = Volley.newRequestQueue(context.getApplicationContext());
        return mRequestQueue;

    public <T> void addToRequestQue(Request<T> req)


I was already using the volley class above to make reports to my server. Now I added a new class for my “home brewed version of fabric”.  As you can see this static method report_Mismatch() has all the data needed to use volley and make a report to your server. I’m using a simple password because the worst a hacker could do with this info is post irrelevant strings to the server .txt file.  The “key1” attribute is how my server knows what type of data is coming in. Is it a custom error report? Or is this for some other process completely.

public class Reporting {

    private final static String TAG = "Reporting";
    public static String MY_PREF = "com.yourname.yourapp";

    final public static  String SERVERADD = "";
    final static private String typeOne = "passwordp1";

    //report clock in/out status by updating my row - no feedback
    public static void report_MisMatch(final Context context, final String myReportData )
        Log.d(TAG, "report_MistMatch: ");
        final String mTag = "report_MisMatch";  //if frequently use volley network to post different things the origination of the response needs to be tracked

        StringRequest postRequest = new StringRequest(Request.Method.POST, SERVERADD,  new Response.Listener<String>() {
            public void onResponse(String response) {
                Log.d(TAG, mTag + "onResponse: " + response);
                Log.d(TAG, "onResponse: sent");

        }, new Response.ErrorListener() {
            public void onErrorResponse(VolleyError error) {
                Log.d(TAG, "onErrorResponse: error");

            protected Map<String, String> getParams()
                Map<String, String>  params = new HashMap<String, String>();
                params.put("key1", "somedata");
                params.put("report", myReportData);
                params.put("gentypeone", getLoginCredentials(context));

                return params;



    //These will be retreived for all functions
    private static String getLoginCredentials(Context context)
        String[] data = new String[4];

        SharedPreferences myPref = context.getSharedPreferences(MY_PREF, Activity.MODE_PRIVATE);

        return new String(typeOne + "randomly_generated_passkey");



On the server side my code was already written due to other server feature. Its php because for literally no money you can setup a server/web address to provide back end app support. These type of low cost servers usually always have php/mysql. Im thinking of transitioning to digital ocean but it seems like a lot of work and I’m lazy.







Stream Video From Android Part 8 – Tips, Tricks and Tests

Want a little bit more? You got it.

A bit of backstory on my experience.  I wrote these blog posts because it seemed there are reference manuals written by experts for experts with no bridge for the beginner to cross relating to these subjects. This article plus the resources I mentioned in the first post and in this page is everything I used. If you read this blog please leave a comment and say hi. I will get great satisfaction knowing it helped you out!

Even still, when your bit shifting and copying buffers you may make a small error. With 150k byte nalus coming in every 32ms you might get a little overwhelmed if your just starting.

You might start feeling like this


FFmpeg is what a lot of people are using. It comes wrapped up in javacv so I touched on it a bit already. But the documentation sucks so let me show you how to use it real fast.

  1. Download the build for your machine. I’m using windows so I download a compressed file and extract it to my program files folder. Then I must set my system path so I can use command line. Search internet on command line installs if you don’t understand as that’s a whole other topic in itself.

2. Navigate cmd line to folder with video and use this command to send a udp stream.

ffmpeg -re -i jjj.mp4 -vcodec libx264 -acodec copy -f h264 “udp://”

If you have a udp socket open you can see hex output or play the video. If you use a saved video from you android device this is a great way to make sure your receiving code is correct. For fun replace h264 with mpegts and write it to hex. Or change the libx264 to copy.


Writing to hex

Creating useful data from your test streams


public class FFStream {

    private final String TAG = "FFStream";

    File file;
    DatagramSocket socket;
    boolean shouldListen = false;
    BufferedWriter writer;

    public FFStream()
        // get the file to send debug info to
        FileChooser fileChooser = new FileChooser();

        file = fileChooser.showOpenDialog(null);

        if (file == null){

            writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), StandardCharsets.UTF_8));
        }catch (IOException ioe){
            System.err.println(TAG + " cons " + ioe.toString());


    private void createSocket()
        System.out.println(TAG + " createsocket ");

            socket = new DatagramSocket(8550);
        }catch (IOException ioe){
            System.err.println(TAG + " createsocket " + ioe.toString());

            shouldListen = true;

        Executors.newSingleThreadExecutor().execute(new Runnable() {
            public void run() {

                byte[] inbuffer ;
                String ip = "failed";
                     ip = InetAddress.getLocalHost().getHostAddress();
                }catch (UnknownHostException e){
                    System.err.println(TAG + " createsocket " + e.toString());

                String port = String.valueOf(socket.getLocalPort());
                System.out.println(TAG + " connect info " + ip + ":" + port);

                    inbuffer = new byte[1500];

                    DatagramPacket packet = new DatagramPacket(inbuffer, inbuffer.length);

                    try {
                        System.out.println(TAG + " waiting on data");
                        socket.receive(packet);     //blocking

                        byte[] data = new byte[packet.getLength()];

                        System.arraycopy(packet.getData(), packet.getOffset(),data,0,packet.getLength());


                    } catch (IOException ioe) {
                        System.err.println(TAG + " createsocket " + ioe.toString());




    private void write(byte[] incoming) throws IOException
        System.out.println(TAG + " write " + String.valueOf(incoming.length) );

        int count = 0;

        for (byte b :
                incoming) {
            writer.write(String.format("%02X", b));

                writer.write(" ");

            if ((count % 16) == 0 ){




    public void setShouldListen(boolean shouldListen) {
        this.shouldListen = shouldListen;


I used these methods to great advantage to check what was being written in different methods

  public static void debugHex(String call, byte[] arr, int length)

        StringBuilder sb = new StringBuilder();
        int count = 0;
        for (byte b :
               arr) {
            sb.append(String.format("%02X", b));
            sb.append(" ");
            if (length == count){

        System.out.println(TAG + call + sb.toString());


    public static void deBugHexTrailing(String call, byte[] arr, int length)
        StringBuilder sb = new StringBuilder();
        int count = 0;

        for (int i = arr.length-1; i >= (arr.length - length) && i >= 0; i--) {

            sb.append(String.format("%02X", arr[i]));
            sb.append(" ");


        System.out.println(TAG + call + sb.toString());

I also used this to check on empty spaces in my nalus and caught a byte[] buffer that was padding data! Whoops!


  public static void fillCompleteNalData(byte[] out, int entryPos, int exitPos)
        int m = (exitPos - entryPos) /2;

        StringBuilder sb = new StringBuilder();

                        .append(" entry ").append(String.valueOf(entryPos))
                        .append(" exit ").append(String.valueOf(exitPos))
                        .append(" bstart ").append(String.format("%02X", out[entryPos]))
                        .append(" bmid ").append(String.format("%02X", out[entryPos + m]))
                        .append(" blast ").append(String.format("%02X", out[entryPos]));



Comparing reassembled nalus

When I was done I also watched the debugger for my android and my pc to compare the length of the nalu I parsed at the encoder to the one I give to my receiving decoder. After using the above test code though they were an exact match.



Stream Video From Android Part 7 – Depacketize and Display

Getting those packets onto a screen.

There will be one more post after this talking about some extra classes and techniques I used to get this done. So if I gloss over something here make sure and check there to get your codes straight.

-> Get the videodecoder code here <-

-> Get the imagedecoder code here <-


In order to decode the video we need a decoder that can understand what the nalu data has inside it. I’m using javafx with the javacv library to create the imagedecoder class. It paints an image onto an imageview with each frame it gets.

Here’s how I call it. Notice I tested it with a saved and emailed video from my android device first to make sure it was working.

public void createImageDecoder()
    System.out.println(TAG + " image decoder");

    FileChooser fileChooser = new FileChooser();

    File file = fileChooser.showOpenDialog(null);

    if (file == null){

    ImageView imageView = mController.getImageView_Images();

    Executors.newSingleThreadExecutor().execute( mediaDecode = new Runnable() {
        public void run() {




The thing is, the imagedecoder class needs pristine annexb style nalus to work. I had all kinds of trouble getting it to play. Finally I downloaded ffmpeg to my windows machine and sent a stream of a video I had saved on my computer already to test the player. It worked. But I was even more clever, I also recorded the bytes sent in that stream so I could compare to what I was sending in myself. Muhahaha!!

Here’s the first section of what ffmpeg streamed. Notice anything? It send a nalu type 0x06 after the sps and pps. I also found out that it sent a nalu type 0x01 as well. I still am not sure what these are as I am writing this blog moments after completing my stream.

00 00 00 01 67 64 00 16 AC D9 40 88 16 FB F0 11 
00 00 03 00 01 00 00 03 00 14 0F 16 2D 96 00 00 
00 01 68 EB E3 CB 22 C0 00 00 01 06 05 FF FF AA 
DC 45 E9 BD E6 D9 48 B7 96 2C D8 20 D9 23 EE EF 
78 32 36 34 20 2D 20 63 6F 72 65 20 31 35 35 20 
72 32 39 30 31 20 37 64 30 66 66 32 32 20 2D 20 
48 2E 32 36 34 2F 4D 50 45 47 2D 34 20 41 56 43 
20 63 6F 64 65 63 20 2D 20 43 6F 70 79 6C 65 66 
74 20 32 30 30 33 2D 32 30 31 38 20 2D 20 68 74 
74 70 3A 2F 2F 77 77 77 2E 76 69 64 65 6F 6C 61 
6E 2E 6F 72 67 2F 78 32 36 34 2E 68 74 6D 6C 20 
2D 20 6F 70 74 69 6F 6E 73 3A 20 63 61 62 61 63 
3D 31 20 72 65 66 3D 33 20 64 65 62 6C 6F 63 6B 
3D 31 3A 30 3A 30 20 61 6E 61 6C 79 73 65 3D 30 
78 33 3A 30 78 31 31 33 20 6D 65 3D 68 65 78 20 
73 75 62 6D 65 3D 37 20 70 73 79 3D 31 20 70 73 
79 5F 72 64 3D 31 2E 30 30 3A 30 2E 30 30 20 6D 
69 78 65 64 5F 72 65 66 3D 31 20 6D 65 5F 72 61 
6E 67 65 3D 31 36 20 63 68 72 6F 6D 61 5F 6D 65 
3D 31 20 74 72 65 6C 6C 69 73 3D 31 20 38 78 38

Here is the stream we will be sending. Notice is goes from sps pps straight to type 65 which is an idr slice. Also notice I had an error (bytes[1]&[2] are same) in my sps pps I was not aware of at the time. Very frustrating. After I fixed these errors this stream pattern plays!

00 00 00 01 67 80 80 1F E9 01 68 22 FD C0 36 85 
09 A8 00 00 00 01 68 06 06 E2 00 00 00 01 65 B8
40 0B E4 2F F9 FF 12 00 02 1A 
B8 48 F0 FF 36 5D 07 1E 52 C3 1F F3 FA A5 77 44 
70 91 04 48 6A 59 C9 AE D3 B9 AA 18 C2 15 82 B4 
30 92 2E C5 2D 26 C5 B0 A7 EE CD 9B 7E 99 D0 BE 
8A 3E AF 69 18 DC 40 5D 40 3F 77 5C 98 49 C6 6D 
4E ED 16 ED FB 7A 0A 04 AF D0 90 61 75 02 CE 3B 
04 D3 69 A3 19 8E A6 AD 20 9B 69 A7 6C 88 AC 6E 
5F F3 1A 2E 86 30 8D C0 15 74 C5 BC 5B 4E D7 F4 
62 02 A8 B2 DA DA 08 31 80 48 F5 F7 5E 39 CC A6 
5D E9 0B 62 DF B4 DE 1B 70 6E 8E 4D 40 B1 FC B6 
68 C9 80 BA 82 1F F8 D7 68 E6 B3 6B 5B 4D 53 14 
05 60 AB 9A 7D 5E D3 24 C3 41 75 16 4E 35 5F FE 
DA 76 DB 1F 18 36 11 CD 74 8C 62 DD 0B A8 74 4F 
00 82 F6 E9 27 A4 6D 8E 24 92 2F F2 F0 BA 83 58 
04 B6 9A 4E D6 DD AC 71 78 15 34 97 CF 50 C6 32 
3C 9B 8B 69 E9 A6 D9 B3 D7 13 22 A1 54 D7 A6 82 
0A 64 08 07 4D 3D 34 FA 76 FE 85 D4 6C 8F F4 D3


In order to create this beautiful array of data we need our videodecoder class to sort through packets and serve up complete nalus in order. My example is still missing timing info an optimization so the picture is choppy and has artifacts. But this is getting you in the door. Which is a hell of lot better than where you started!

You have my code but the overall strategy is pretty basic.

  1. incoming udp packets are written into my video decoder addpacket method
  2. Packets are sorted by type all my packets were type 24(spspps) and type 28(nalu chunks)
  3. packets are reworked and sent to the video decoder

Reworking sps pps

Thisvis easy simply split them up, add your  0x00 0x00 0x00 0x01 start code and send em through.

Reworking type 28 fua nalu chunks

To rebuild my nalus I kept a list

private Map<Integer, NaluBuffer> assemblyLine = new HashMap<>();

If a nalu has 100 pieces each piece shares the same timestamp. So I created a synchronized method to check if my list has already started building the nalu or if I need to start a new buffer. As below…

  // Unpack either any split up nalu - This will get 99.999999 of nalus
    synchronized private void unpackType28(byte[] twentyEight)
        //Debug.deBugHexTrailing("unpack 28 ", twentyEight, 20 );

        int ts = (twentyEight[4] &lt;&lt; 24 | twentyEight[5] &lt;&lt; 16 | twentyEight[6] &lt;&lt; 8 | twentyEight[7] &amp; 0XFF);   //each nalu has a unique timestamp
        //int seqN = (twentyEight[2] &lt;&lt; 8 | twentyEight[3] &amp; 0xFF);                                               //each part of that nalu is numbered in order.
                                                                                                                // numbers are from every packet ever. not this nalu. no zero or 1 start
        //check if already building this nalu
        if (assemblyLine.containsKey(ts)){


        //add a new nalu

            assemblyLine.put(ts, new NaluBuffer(ts, twentyEight));



As each piece is loaded into a buffer a few things happen. We record how long its been waiting (nalus that aren’t completed in under a second are worthless), we strip out the rtp headers etc, and we count sequence numbers to rebuild each piece one after another checking if the nalu is complete each time. Once complete its sent through to the video decoder.

My current code needs serious optimization. So you will notice major artifacts due to missing or late nalus and timing? forget about it. But its pretty simple to understand and you can build those features yourself.

One More post to go!


Stream Video From Android Part 6 – Packetize RTP

I know, you deserve a nap but please hang in there.

The file I’m referencing is in the last post if you need it. Also MAJOR WARNING HERE!!! I did not test my rtp packet code against another software. There may be errors because I simply wrote what seemed to make sense and then wrote a javafx program to open it on the other side. But this still should get you pretty dang close.

In the previous post we had our sps, pps and different nalus being fed into a packetizer to be sent over the internet. As you are well aware in java we can use a TCP or UDP connection. Either is fine but I will focus on UDP for this post. Lets talk about that process.

RTP is a defined format that can send all kinds of data including video streams. A udp packet contains an rtp packet which contain a piece of data. We choose our packet size based on maximum transmission unit which is the number of bytes we can send at a time. In our example we set our mtu to 1500 and we limit our payload to 1300 so we have space for the enclosing packet headers as well.

Let look at the spspps packet. Its smaller and can be sent in a single rtp packet. All rtp packets must comply with the rfc guidelines. Search rfc 6184  to see what I’m talking about. It describes packetizing different data in different ways. Remember our buildspspps method? It needs to be organized according to these protocols.  Below we build the payload that will be inserted into our rtp packet. This is done only once.

// get from myvideo / build sps and pps data
private void buildSPSPPS()
    //without this stream is worthless
    if (sps == null || pps == null){
        notEOF = false;
        Log.d(TAG, "buildSPSPPS: no sps or pps data");

    if (description == null){

        description = new byte[sps.length + pps.length + pref.length ];

        description[0] = 24;
        //rtp header trpe 24 = Single-time aggregation packet     5.7.1

        // Write NALU 1 size into the array (NALU 1 is the SPS).
        description[1] = (byte) (sps.length >> 8);
        description[2] = (byte) (sps.length & 0xFF);

        // Write NALU 2 size into the array (NALU 2 is the PPS).
        description[sps.length + 3] = (byte) (pps.length >> 8);
        description[sps.length + 4] = (byte) (pps.length & 0xFF);

        //write prefix
        //System.arraycopy(pref, 0, description, description.length-6, pref.length);

        // Write NALU 1 into the array, then write NALU 2 into the array.
        System.arraycopy(sps, 0, description, 3, sps.length);
        System.arraycopy(pps, 0, description, 5 + sps.length, pps.length);

        Debug.debugFull(" build spspps ", description);


Then before every idr picture we send this data via an rtp packet.

private void packetizeDecsription()
    buildRTPPacket(24, timeStamp, description, description.length);

Its important to note that we need to keep track of each packet sent so that or depacketizer can determine order and timing on the other side. That means we can have only a single buildRTPPacket() method and that it must be accessed from a single thread or synchronized. There are plenty of diagrams within the source code file but as you can see I build a header and combine the payload with the header.  All rtp packets have the same info and are sent through this method.

public void buildRTPPacket(int payloadType, int timeStamp, byte[] payload, int payloadLength)
    //Log.d(TAG, "buildRTPPacket: " + String.valueOf(payloadType));
    //this is the actual packet being sent
    byte[] rtpPacket = new byte[HEADER_SIZE + payloadLength];
    sequenceNumber++;                                   //keep our packet stream linear.splt nalus with same timestamp are ordered by this number

    rtpHeader = new byte[HEADER_SIZE];

    rtpHeader[0] = (byte) 0b10000000;                  //(byte) (VERSION << 6 | PADDING << 5 | EXTENSION << 4 | CSRC_COUNT);
    rtpHeader[1] = (byte) payloadType;                 //ignore market bit //The first byte of a NAL unit co-serves as the RTP payload header   ->   5.6
    rtpHeader[2] = (byte) (sequenceNumber >> 8);       //sequence move bits 8-16 right into the 8 bit buffer
    rtpHeader[3] = (byte) (sequenceNumber & 0xff);     //sequence only keep the the last 8 bits by masking
    rtpHeader[4] = (byte) (timeStamp >> 24);           //time stamp
    rtpHeader[5] = (byte) (timeStamp >> 18);           //time stamp
    rtpHeader[6] = (byte) (timeStamp >> 8);            //time stamp
    rtpHeader[7] = (byte) (timeStamp & 0xFF);          //time stamp
    rtpHeader[8] =  (byte) (SSRC >> 24);               //ssrc
    rtpHeader[9] =  (byte) (SSRC >> 16);               //ssrc
    rtpHeader[10] = (byte) (SSRC >> 8);                //ssrc
    rtpHeader[11] = (byte) (SSRC & 0xff);              //ssrc

    // here we load the header into the first 12 bytes and the payload after that
    System.arraycopy(rtpHeader, 0, rtpPacket,0,HEADER_SIZE);
    System.arraycopy(payload,0,rtpPacket,HEADER_SIZE, rtpPacket.length-HEADER_SIZE);


    //send as soon as its built  with true or false for console debugging
    send(rtpPacket, false);


Packetizing the nalu is easier than unpacking it. Most nalus, but possibly not all, are much much bigger than our mtu so they need to be broken down. I’m using the FU-A format because the libstream library I was studying used it as well. It requires a two part header on top of each nalu piece. Each nalu piece shares the same timestamp as you can see but they each have an incremental sequence number provided by the rtp packet. Without the correct sequence number and timestamp your nalus cannot be reassembled. Below I loop until my entire nalu is sent.

//called by build nalu to send asap after building it will packetize nalu split it up or whatever
// single nalu (section 5.6) or fu-a    see ->   section 5.4
private void packetizeNalu()

    //Log.d(TAG, "packetizeNalu: " + String.valueOf(naluBuffer.length));

    //error checking
    int nalLengthChecker = (naluBuffer[3]&0xFF | (naluBuffer[2]&0xFF)<<8 | (naluBuffer[1]&0xFF)<<16 | (naluBuffer[0]&0xFF)<<24); //original nal length
    nalLengthChecker += 4; //here we add for the header 0-3... remember length includes nal header[4]type already
    int writtenLength = 0;
    int bytesAdded = 2; //here is our fua header

    byte[] buffer;

    //single nalu
    if (naluLength+1 <= MAX_SIZE )
        buffer = new byte[naluLength+1];
        buffer[0] = naluHeader[4];
        System.arraycopy(naluBuffer, 0, buffer, 1, naluLength);

        buildRTPPacket(type, timeStamp, naluBuffer, naluLength);

    // nalu is split like fu-a type
             FU-indicator      FU header
           +---------------+ +---------------+ +---------------+
           |0|1|2|3|4|5|6|7| |0|1|2|3|4|5|6|7|
           +-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+     FU Payload
           |F|NRI| TypeofFU| |S|E|R|  Type   |
           +---------------+ +---------------+ +---------------+
             See rfc 6184 5.8 figure 15-ish
             See rfc 6184 5.3 "the value of NRI to 11"

                FU-A is type 28

        byte[] fuaHeader = new byte[2];

        fuaHeader[0] = 0b01111100; //set indicator with "11" and type decimal 28 = FU-A -> 01111100
        fuaHeader[1] = (byte) (naluHeader[4] & 0x1F); //set header 3-7

        int tally = 0;
        int tocopy;
        boolean secondloop = false;

        while(tally < naluLength)
            tocopy = (naluLength - tally);              //see whats left to write
            if (tocopy >= MAX_SIZE-2)                       // we minus 2 to make space for both header bytes
                tocopy = MAX_SIZE-2;                    //fit into max allowable packet size
                buffer = new byte[MAX_SIZE];
                buffer = new byte[tocopy+2];            //or shrink buffer to whats left plus header

            if (secondloop)         //turn ser to double zero on second loop
                 fuaHeader[1] = (byte) (fuaHeader[1]^(1 << 7));
                secondloop = false;
                //String s1 = String.format("%8s", Integer.toBinaryString(fuaHeader[1] & 0xFF)).replace(' ', '0');
                //Log.d(TAG, "packetizeNalu: center header " + s1);


            if (tally == 0)                                 //first nalu in multi part. set SER...see above
                fuaHeader[1] += 0x80;
                secondloop = true;
                //String s1 = String.format("%8s", Integer.toBinaryString(fuaHeader[1] & 0xFF)).replace(' ', '0');
                //Log.d(TAG, "packetizeNalu: adjusted header " + s1);


            System.arraycopy(naluBuffer, tally, buffer, 2, tocopy); //copy to buffer skipping first 2 bytes
            tally += tocopy;

            if (tally >= naluLength)                        //weve copied all the data, set ser to last on multipart
                fuaHeader[1] += 0x40;
                //String s1 = String.format("%8s", Integer.toBinaryString(fuaHeader[1] & 0xFF)).replace(' ', '0');
                //Log.d(TAG, "packetizeNalu: re-adjut header " + s1);

            buffer[0] = fuaHeader[0];
            buffer[1] = fuaHeader[1];

            buildRTPPacket(fuaHeader[0], timeStamp, buffer, buffer.length);

            writtenLength += (buffer.length - bytesAdded); //here we count how many bytes were sent to packetizer to compare to our starting amount

        if (writtenLength != nalLengthChecker){
            Log.e(TAG, "packetizeNalu: Mismatched Size orig: " + String.valueOf(nalLengthChecker) + " written " + String.valueOf(writtenLength), null );




Then we send them to there destination. That’s it! Your data is on its way!!!


Stream Video From Android Part 5 – Parse NALUs

 Getting to what you really read all this stuff for.

-> Get the transfer file referenced in this article <-

Why do we need to parse nalus? Because nalues are 100k bytes and we cant send files that size over the internet. But your clever,  you’ll just want to send them over TCP and let the java socket class do the work . Not so fast!

When we are sending data we should consider 2 things, one packetizing that data efficiently and timing how fast those nalus are coming out so we can play them at the right speed on the other side. Remember, normally a video file has all those boxes and header that tell it when to play each frame. But when we are streaming we simply have nalus pouring out of the buffer.

Remember that our android myvideoclass does two things. Records a shot video to get the sps pps. Then it restarts in stream mode and passes the streaming data to th transferH264 class. The transferh264 class  does two things for me.

  1. It reads the incoming stream and sorts out the nalus
  2. It packetizes those nalus to be sent to wherever the hell


Remember, we passed the transferh264 object a pipe/inputstream and also a reference to a udp socket if you didn’t notice. This is all done on a separate thread. Here is our repeating loop. Notice we create our sps and start by searching for a nalu with picture data. Then we keep recording each nalu and timing them so we can rebuild them with correct timing on the other side.

public void run()
    }catch (IOException ioe){
        Log.e(TAG, "run: ", ioe );


    try {

        // find the mdat box?

        //build our description

        //find th first nalu

        while (notEOF)

            duration = System.nanoTime() - start;


            start = System.nanoTime();




    } catch (IOException e) {
                "Exception transferring file", e);

Now in our situation we know that we need to find Avcc style header but I will show  how to search for both avcc and for annex b in a byte stream.

Here is the method I used in my actual code. If you wanted to search for annex b instead of this

naluHeader[0] = naluHeader[1];
naluHeader[1] = naluHeader[2];
naluHeader[2] = naluHeader[3];
naluHeader[3] = naluHeader[4];
naluHeader[4] = (byte);

type = naluHeader[4]&0x1F;

if (type == 5 || type == 1)

    naluLength = (naluHeader[3]&0xFF | (naluHeader[2]&0xFF)<<8 | (naluHeader[1]&0xFF)<<16 | (naluHeader[0]&0xFF)<<24) - 1; //minus type for header!!! if (naluLength > 0 && naluLength < 200000)
        //Log.d(TAG, "naluSearch: found length = " + String.valueOf(naluLength) + " of type: " + String.valueOf(type) + " try req: " + String.valueOf(reqLoops));

try this

if (naluHeader[3] == 0x00 && naluHeader[2] == 0x00 && naluHeader[1] == 0x00 && naluHeader[0] == 0x01)
    //we found it


private void syncWithNalu() throws IOException
    //Log.d(TAG, "syncWithNalu: started - we have no position! invalid data is length = " + String.valueOf(naluLength) + " type: " + String.valueOf(type));

    byte save = naluHeader[0];
    boolean firstPass = true;

    int reqLoops = 0;
    while (true){

        naluHeader[0] = naluHeader[1];
        naluHeader[1] = naluHeader[2];
        naluHeader[2] = naluHeader[3];
        naluHeader[3] = naluHeader[4];
        naluHeader[4] = (byte);

        type = naluHeader[4]&0x1F;

        if (type == 5 || type == 1)

            naluLength = (naluHeader[3]&0xFF | (naluHeader[2]&0xFF)<<8 | (naluHeader[1]&0xFF)<<16 | (naluHeader[0]&0xFF)<<24) - 1; //minus type for header!!! if (naluLength > 0 && naluLength < 200000)
                //Log.d(TAG, "naluSearch: found length = " + String.valueOf(naluLength) + " of type: " + String.valueOf(type) + " try req: " + String.valueOf(reqLoops));
            if (naluLength==0)

                Log.d(TAG, "naluSearch: null nalu");

        }else if (firstPass) {
            firstPass = false;

            int testtype = (naluHeader[2] &0xFF | (naluHeader[1]&0xFF)<<8 | (naluHeader[0]&0xFF)<<16 | (save &0xFF)<<24) - 1; //minus type for header!!!

            //DEBUG BAD NALUS HERE
            String tt = String.valueOf(testtype);

            byte[] b = new byte[512];

           //Debug.debugHex("syncwithnalu " + tt, test, test.length);

            b[0] = save;
            b[1] = naluHeader[0];
            b[2] = naluHeader[1];
            b[3] = naluHeader[2];
            b[4] = naluHeader[3];
            b[5] = naluHeader[4];
  , 6, b.length-6);
            Debug.debugHex("syncwithnalu " , b, 30);



Once we sync our nalu stream we know exactly how long our next nalu should be. So lets fill it in with the buildnalu method.  Simply copying into a buffer. Well get to packetization in the next section so don’t worry about that part yet.

//build next data which should be video payload
private void buildNalu() throws IOException

naluBuffer = new byte[naluLength+5];

//here we recombine our original header to our nalu data to be sent
naluBuffer[0] = naluHeader[0];
naluBuffer[1] = naluHeader[1];
naluBuffer[2] = naluHeader[2];
naluBuffer[3] = naluHeader[3];
naluBuffer[4] = naluHeader[4];, 5, naluLength);
naluLength = naluBuffer.length;

test[0] = naluBuffer[naluBuffer.length-4];
test[1] = naluBuffer[naluBuffer.length-3];
test[2] = naluBuffer[naluBuffer.length-2];
test[3] = naluBuffer[naluBuffer.length-1];
test[4] = naluBuffer[naluBuffer.length-4];
test[5] = naluBuffer[naluBuffer.length-3];
test[6] = naluBuffer[naluBuffer.length-2];
test[7] = naluBuffer[naluBuffer.length-1];

timeStampCalulations(); //here we calc the time between reading each nalu. each nalu must have different time stamp

// String s1 = String.format("%8s", Integer.toBinaryString(naluHeader[4] & 0xFF)).replace(' ', '0');
//Log.d(TAG, "packetizeNalu: expected raw " + s1);
//debugPackets("buildnalu ", naluBuffer);



So we have our first nalu loaded and sent. Our next nalu should be right behind it. No need to search. We test and make sure its right. If it is we go ahead and let it load the data. If the nalu is bad we go back to syncing method.

/read next header into header fields. expects to be dropped into correct position or it will perform a sync
private void readNextHeader() throws IOException
{, 0, 5);

    type = naluHeader[4]&0x1F;
   // String s1 = String.format("%8s", Integer.toBinaryString(naluHeader[4] & 0xFF)).replace(' ', '0');
   // Log.d(TAG, "packetizeNalu: handing type " + s1 + " " + String.valueOf(type));

    naluLength = (naluHeader[3]&0xFF | (naluHeader[2]&0xFF)<<8 | (naluHeader[1]&0xFF)<<16 | (naluHeader[0]&0xFF)<<24)- 1; //minus 1 for header!!! if (naluLength >= 200000 || naluLength < 0){

        Log.d(TAG, "readNextHeader success    type " + String.valueOf(type) + "  length " + String.valueOf(naluLength));

    // IDR is a stand alone picture. sending spspps will ake it readable even in a live stream format without session description protocal
    if (type == 5){


Notice how we check if the header is type 5. Type 5 slice is an IDR picture which means its the full readable image. It is followed by type 1 for example which tells us what to change on the original type 5 slice.  So before each type 5 slice we send the sps and pps so that the decoder receiving the images has all the data it need to decode the pictures.

Below is my debugger output of the first 10 characters of each nalu without the 4 bytes of length. There was probably about 10 more type 41 and then it repeated again and again. This is the pattern we are trying to feed to our packetizing method.

VideoDecoderaddPacket type: 24
Debug transfer 67 80 80 1E E9 01 68 22 FD C0 
Debug transfer 68 06 06 E2 
Debug transfer 65 B8 20 00 9F 80 78 00 12 8A 
Debug transfer 41 E2 20 09 F0 1E 40 7B 0C E0 
Debug transfer 41 E4 40 09 F0 29 30 D6 00 AE 
Debug transfer 41 E6 60 09 F1 48 31 80 99 40 

Now that we have our data. Lets chop it up and send it in the next part.


Stream Video From Android Part 4 – Parse Boxes and SPS or PPS

Alright, your have the taste of blood in your mouth and you like it. Lets dive deeper and actually try parsing some data.

->source code here <-

In our android app we have saved the file and now we are passing it to our sdp maker. Ironically I do not actually use session description protocol. I just didn’t know it was not needed so I named this class incorrectly.

Our sdpmaker class uses a randomaccessfile to parse through the data by reading each byte and looking for box headers. Here is the start of the method where you can get the idea of how the whole class works.

public static byte[][] retreiveSPSPPS(File file) throws IOException, FileNotFoundException
    byte[] sps = new byte[0];                                 //we will find the bytes and read into these arrays then convert to the string values
    byte[] pps = new byte[0];
    byte[] prfix = new byte[6];
    byte[][] spspps;
    String[] spsppsString = new String[2];

    RandomAccessFile randomAccessFile;              //file type to allow searching file byte by byte
    long fileLength = 0;
    long position = 0;
    long moovPos = 0;

    byte[] holder = new byte[8];

                                                        //get the file we saved our little video too

       randomAccessFile = new RandomAccessFile(file, "r");
        fileLength = randomAccessFile.length();

                                                            // here we find the moov box within the mp4 file
    while(position < fileLength) {
        //read our current position and then advance to next position, 0, 8);
        position += 8;

        if (checkForBox(holder)) {
            String name = new String(holder, 4, 4);

            if (name.equals("moov")) {
                moovPos = position;
                Log.d(TAG, "retreiveSPSPPS: found moov box = " + name);


Check out this picture again. Notice how the boxes are nested?  My code above isn’t optimized for any use case. But as you can see I start by finding the moov box and then search my way through each nested box to find the data i need.

Here you can see where I am extracting my sps. Read through the full code for all the details.

if (read[0] == 'g') {

                                        //ascii 'g' = hex 67  &lt;- we found the sps
    int length = bLength &amp; 0xff;        //blength is the length of the sps

    remaining = new byte[length];, 0, length-1); //minus 1 because we already read the g

                                            //scoot everything down and add our g at the begining
    for (int i = length-1; i &gt; 0 ; i--) {

        remaining[i] = remaining[i-1];


    remaining[0] = read[0];
    sps = remaining;

    String s = bytesToHex(remaining);
    Log.d(TAG, "retreiveSPSPPS: found sps: " + s + " length used: " + String.valueOf(length));


Once this is done we need to save this data in our app. As long as we don’t change the media recorder settings when we stream this sps and pps data will allow a decoder to decode it correctly.

On a side note I also saved a prefix byte[] but this turned out to be unnecessary. Stick with the sps and pps.


Stream Video From Android Part 3 – Understanding h264 in mp4

The last section was tough, it only get tougher.

As I said, the mp4 file is streamed and then data to decode the file is written after.  So like most file types mp4 is constructed in parts. Frequently you hear the term file header which is a section that explain the files contents. With mp4 its full of boxes. These boxes might be at the beginning or they might be at the end. We don’t know and we have to find out. Below is a software that allows you to open up the contents of an mp4 file.

Some key parts…

fytp -> decsribes basic contents

mdat -> the actual video data

avCC -> the stuff we need to decode the data


Parsing a video file

We are examining the h264 codec. h264 is a software that takes images and encodes them to reduce file size. These images are then wrapped up in the above boxes and into an mp4 container.

Lets think about this, you camera takes a 2mb picture. A video plays 30 frames per second or 30fps. A cd can hold 700mb. So if a video was simply a series of pictures a DVD would only contain 60mb per second or 12 seconds of video total.

Instead the h264 codec compresses a single image then records a series of “changes” that happen to the image.  So 30 frames of a single second of video might be one actual complete image and 29 “changes” to that image. This is the pattern below repeated numerous times as video plays.


Of course there is a lot more to know but each of the above is called a slice. These slices are saved in that mdat box one after another. A slice is not the same as a frame but sometimes it can be.

NALU or network abstraction layer unit is what these slices are saved as. These nalus are saved on after another and are separated by headers. There are two main types of headers we will be dealing with. Below the are written in HEX

AnnexB   -> 0x00 0x00 0x00 0x01 0x65 The last tells you what type. The first four is just a startcode with no data

These headers are simply a string of zeros and a one plus the nalu type. The video codec makes sure there are no other instances where this format can be found in the data output.

Avcc ->  0x00 0x02 0x4A 0x8F 0x65 The first four are the length the last described what type it is.Obviously the first four change with each data it represents.

If you make the accidental mistake of padding some data by copying a half filled buffer you will destroy your data’s readability by any decoder because you emulate the annex-b style start code.  This goes for either type. Working with this data is unforgiving. (Sound like the voice of experience here!)

Here are the different types.

0      Unspecified                                            non-VCL
1      Coded slice of a non-IDR picture                             VCL
2      Coded slice data partition A                                 VCL
3      Coded slice data partition B                                 VCL
4      Coded slice data partition C                                 VCL
5      Coded slice of an IDR picture                                VCL
6      Supplemental enhancement information (SEI)              non-VCL
7      Sequence parameter set                                 non-VCL
8      Picture parameter set                                non-VCL
9      Access unit delimiter                                   non-VCL
10     End of sequence                                          non-VCL
11     End of stream                                           non-VCL
12     Filler data                                             non-VCL
13     Sequence parameter set extension                        non-VCL
14     Prefix NAL unit                                         non-VCL
15     Subset sequence parameter set                            non-VCL
16     Depth parameter set                                     non-VCL
17..18 Reserved                                                 non-VCL
19     Coded slice of an auxiliary coded picture without partitioning non-VCL
20     Coded slice extension                                 non-VCL
21     Coded slice extension for depth view components         non-VCL
22..23 Reserved                                               non-VCL
24..31 Unspecified                                           non-VCL

Based on this information expect to see files like this.

[size or start code][type][data payload]  repeated x infinity…might as well be

Parsing SPS & PPS

Data in each box can also be found if you know where to look. Check out our avcc box here. I have it labeled for you and you can see it in hex and ascii.

Here you can find the data necessary to parse you video file. According to this chart… source is stackoverflow

8   version ( always 0x01 )
8   avc profile ( sps[0][1] )
8   avc compatibility ( sps[0][2] )
8   avc level ( sps[0][3] )
6   reserved ( all bits on )
2   NALULengthSizeMinusOne
3   reserved ( all bits on )
5   number of SPS NALUs (usually 1)
repeated once per SPS:
  16     SPS size
  variable   SPS NALU data
8   number of PPS NALUs (usually 1)
repeated once per PPS
  16    PPS size
  variable PPS NALU data

Remember the avcc 4 header bytes that gave you the length? Those are described in NAlulengthsizeminusone. They could also be two bytes for example. Its minus one because you can only count to three with the two bits of space allowed so 11 = 4 and 01 = 2….a bit quirky.

Now we have an understanding of the basic makeup of a mp4 file lets parse it in next section. Where we go deeper.

Stream Video From Android Part 2 – Getting Camera Data


1. First we need a camera 2 object to record our video

-> Here is the code <-  (Its saved as a .txt just change the suffix if you want to cut and paste but I don’t recommend doing that)


Androids camera 2 api is a beast itself. Using the official example will take most of the work out. Here is the example I copied.

The above example is buggy (at time of writing) even though its provided by google.

Heres the thing, when you record a video the camera starts sending the data to a file right away. Only after the recording is complete does the api write the data needed to play that file in the form of file headers . We need to do 2 things with this api

  1. We need to record short videos and extract that header data before we even start streaming.
  2. When we are streaming we have to  direct androids media recorder to send the file instead of writing the file.

Here is my code all folded up


The top fold is boilerplate stuff not super important


The third fold are the normal camera methods called to operate the camera. Lets focus in on the ones that are tough and critical to you success

The android camera method setUpMediaRecorder needs to tell android where to send you actual video data. Below you will see that I have created a bool flag that either saves the data or calls a method.

if (collectSDP){
    Log.d(TAG, "setUpMediaRecorder: collect");
    Log.d(TAG, "setUpMediaRecorder: dontcollect");

In order to get some critical data we need later we need to save a small video file. In the method getSDP you will see I go through the motions of recording a short video.  The video is then parsed with the SDPMaker class which is the next part of this tutorial.

You will also see that option 2 of the above code is getFD. This returns a file descriptor which allows you to connect an outputstream -> inputstream. This tricks your camera api into writing into a buffer which is immediately read by another class which will package and send the data to wherever you are sending it. Notice all the abandon code.

private FileDescriptor getStreamFd() {

    ParcelFileDescriptor[] pipe = null;

    try {
        pipe = ParcelFileDescriptor.createPipe();

        new TransferThread(new ParcelFileDescriptor.AutoCloseInputStream(pipe[0]),
                new Socket(), mRobotPoint).start();

         transferH264 = new TransferH264(new ParcelFileDescriptor.AutoCloseInputStream(pipe[0]),
                new Socket(), mRobotPoint, this, ssrc);


        transH264 = new TransH264(new ParcelFileDescriptor.AutoCloseInputStream(pipe[0]),
                new Socket(), mRobotPoint, this, ssrc);


       new TransferH264(new ParcelFileDescriptor.AutoCloseInputStream(pipe[0]),
                new Socket(), mRobotPoint, this).start();

    } catch (IOException e) {
        Log.e(getClass().getSimpleName(), "Exception opening pipe", e);

    return (pipe[1].getFileDescriptor());


This is a multipart post…keep going


Stream Video From Android Part 1 – Why & How

2020 update  Where I go over better methods on android and give some quick ffmpeg tips



I’m not sure how you got here, but if you need to stream or dissect video this might be your lucky day. Whenever I write a new article my biggest joy is choosing an image to represent the experience a user should expect when tackling this project. I couldn’t make up my mind and both are appropriate.


Over the last several weeks I have been falling down the rabbit hole of streaming from an android device to another computer. Despite the ubiquity of streaming apps right now there is not a lot of easy solutions  on how to stream from android and I found more unanswered questions than answers.

In this post I will tell you step by step how I streamed from android to a pc. But be warned this is for average to ninja developers. If its your first android, desktop of web app you might not make it through this explanation.

Secondly, I am no expert in video or streaming services. I’m posting this article as fresh as I can from learning in order to maintain my beginner point of view. This article is a great stepping stone for someone like me trying to understand the basics. The code works on my device. Its not production code, video artifacts remain and I have a lot of optimization and polishing that will be done. But I think its much easier to understand my rough code with all its notes etc that to see a bunch of polished methods



Part 2 – Getting Camera Data

Part 3- Understanding h264 mp4







There are several ways to come at streaming lets consider them all and I tell you why i did what i did.

— Using Someone Else’s Frame Work —

The first is relying on an outside streaming service. There are several services that will gladly provide a simple api and charge you to stream data from the phone to a server somewhere.

The second is a free library that will give you a framework. Webrtc is an example that can allow you to video chat and provides an api for establishing communications. The second is libstream which was one of my greatest learning tools in this project.

Third are headless api’s such as ffmpeg and gstreamer. Ffmpeg is a little tough to get started with but it provided some great data for me to examine.

If you need to video chat or stream entertainment content these are great solutions.

–Doing The dirty Work Yourself —

There are two pitfalls with the above methods. Either in the form of money or constraints based on the api. I wanted to really understand how the video  data was being routed. I wanted to route everything myself and control almost every detail.

My intention was to eventually use this android video and audio data for AI processing at a remote location. I didn’t want to let someone else’s API decide the bandwidth usage or whether FPS or image compression would be used to compensate for such issues of reduced connection.  I also wanted to control how when connection are established. Truthfully I had no idea how the above things would be handled in an api. But I didn’t want to find out after  I committed and decided I needed to learn a bit so the journey was worth it for me.

Inside android there are two methods for getting video data. MediaRecorder and MediaCodec. I chose MediaRecorder out of simplicity although I suspect MediaCodec has further advantages. If you follow this guide I suspect you can switch later when you are a bit smarter and have had some time to breath.

In this example I’m using android studio and sending the data to a javafx app on windows desktop

Below is the basic outline of the steps:

This is the very very broad strokes.

Camera2 ->Basic video class or recording with camera2 and a few tweaks

Data pipe -> This gets your data out of the camera2 class

Packetizer -> Bundle that data and send it somewhere

Depacketizer -> Get that video back out andinto a decoder

Why is it so hard?

Looking at the three steps above you may wonder, if you have already been trying a bit, why is it so tough? Think about it. The first successful streaming service was skype in 2003. Google duo only came out in 2016 and apple video chat a few years earlier.  Even more remarkable is that these video services share many of the similar software libraries and video streaming in any quality relies on the heavily patented h264 codec. So don’t stress out too much. Its tough and there’s a lot to know.

This is a multi part post so please keep moving forward.