Building a Video Chat App, Part 3 - Displaying Video

Thursday, Nov 5, 2020 7 minute read Tags: javascript azure
Hey, thanks for the interest in this post, but just letting you know that it is over 3 years old, so the content in here may not be accurate.

On my Twitch channel we’re continuing to build our video chat application on Azure Communication Services (ACS).

Last time we learnt how to access the camera and microphone using the ACS SDK, and today we’ll look to display that camera on the screen.

Displaying Video

As we learnt in the last post, cameras are available via a MediaStream in the browser, which we get when the user grants us access to their cameras. With raw JavaScript this can be set as the src attribute of a <video> element and the camera feed is displayed. But there’s some orchestration code to setup and events to handle, so thankfully ACS gives us an API to work with, LocalVideoStream and Renderer.

Creating a LocalVideoStream

The LocalVideoStream type requires a VideoDeviceInfo to be provided to it, and this type is what we get back from the DeviceManager (well, we get an array of them, you then pick the one you want).

We’ll start by creating a new React context which will contain all the information that a user has selected for the current call.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
export type UserCallSettingsContextType = {
    setCurrentCamera: (camera?: VideoDeviceInfo) => void;
    setCurrentMic: (mic?: AudioDeviceInfo) => void;
    setName: (name: string) => void;
    setCameraEnabled: (enabled: boolean) => void;
    setMicEnabled: (enabled: boolean) => void;
    currentCamera?: VideoDeviceInfo;
    currentMic?: AudioDeviceInfo;
    videoStream?: LocalVideoStream;
    name: string;
    cameraEnabled: boolean;
    micEnabled: boolean;
};

const nie = <T extends unknown>(_: T): void => {
    throw Error("Not Implemented");
};

const UserCallSettingsContext = createContext<UserCallSettingsContextType>({
    setCurrentCamera: nie,
    setCurrentMic: nie,
    setName: nie,
    setCameraEnabled: nie,
    setMicEnabled: nie,
    name: "",
    cameraEnabled: false,
    micEnabled: false
});

Note: I’ve created a stub function that throws an exception for the default hook setter functions called nie.

The context will provide a few other pieces of data that the user is selecting, such as their preferred mic and their name, but we’re really focusing on the videoStream which will be exposed.

Now let’s implement the context provider:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
export const UserCallSettingsContextProvider = (props: {
    children: React.ReactNode;
}) => {
    const [currentCamera, setCurrentCamera] = useState<VideoDeviceInfo>();
    const [currentMic, setCurrentMic] = useState<AudioDeviceInfo>();
    const [videoStream, setVidStream] = useState<LocalVideoStream>();
    const { clientPrincipal } = useAuthenticationContext();
    const [name, setName] = useState("");
    const [cameraEnabled, setCameraEnabled] = useState(true);
    const [micEnabled, setMicEnabled] = useState(true);

    useEffect(() => {
        if (clientPrincipal && !name) {
            setName(clientPrincipal.userDetails);
        }
    }, [clientPrincipal, name]);

    useEffect(() => {
        // TODO - handle camera selection
    }, [currentCamera, videoStream]);

    return (
        <UserCallSettingsContext.Provider
            value={{
                setCurrentCamera,
                setCurrentMic,
                currentCamera,
                currentMic,
                videoStream,
                setName,
                name,
                setCameraEnabled,
                cameraEnabled,
                setMicEnabled,
                micEnabled
            }}
        >
            {props.children}
        </UserCallSettingsContext.Provider>
    );
};

export const useUserCallSettingsContext = () =>
    useContext(UserCallSettingsContext);

When the currentCamera is changed (by user selection or otherwise) we’re going to want to update the LocalVideoStream, and that’s the missing useEffect implementation. First off, we’ll need to create one if it doesn’t exist, but since we can’t create it until there’s a selected camera, we’ll check for that:

1
2
3
4
5
6
useEffect(() => {
    if (currentCamera && !videoStream) {
        const lvs = new LocalVideoStream(currentCamera);
        setVidStream(lvs);
    }
}, [currentCamera, videoStream]);

Using the LocalVideoStream

We’ve got ourselves a video stream, but what do we do with it? We need to create Renderer that will handle the DOM elements for us.

Let’s create a component that uses the context to access the LocalVideoStream:

1
2
3
4
5
6
7
const VideoStream = () => {
    const { videoStream } = useUserCallSettingsContext();

    return <div>Show video here</div>;
};

export default VideoStream;

The Renderer, which we’re going to create shortly, gives us a DOM element that we need to inject into the DOM that React is managing for us, and to do that we’ll need access to the DOM element, obtained using a ref.

1
2
3
4
5
6
const VideoStream = () => {
    const { videoStream } = useUserCallSettingsContext();
    const vidRef = useRef < HTMLDivElement > null;

    return <div ref={vidRef}>Show video here</div>;
};

Since our videoStream might be null (camera is off or just unselected), we’ll only create the Renderer when needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
const VideoStream = () => {
    const { videoStream } = useUserCallSettingsContext();
    const vidRef = useRef<HTMLDivElement>(null);
    const { renderer, setRenderer } = useState<Renderer>();

    useEffect(() => {
        if (videoStream && !renderer) {
            setRenderer(new Renderer(videoStream));
        }
    }, [videoStream, renderer]);

    return (
        <div ref={vidRef}>Show video here</div>
    );
};

With the Renderer created, the next thing to do is request a view from it, which displays the camera feed. We’ll do this in a separate hook for simplicities sake:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
const VideoStream = () => {
    const { videoStream } = useUserCallSettingsContext();
    const vidRef = useRef<HTMLDivElement>(null);
    const { renderer, setRenderer } = useState<Renderer>();

    useEffect(() => {
        if (videoStream && !renderer) {
            setRenderer(new Renderer(videoStream));
        }
    }, [videoStream, renderer]);

  useEffect(() => {
    if (renderer) {
      renderer.createView().then((view) => {
        vidRef.current!.appendChild(view.target);
      });
    }

    return () => {
      if (renderer) {
        renderer.dispose();
      }
    };
  }, [renderer, vidRef]);

    return (
        <div ref={vidRef}></div>
    );
};

The createView method from the Renderer will return a Promise<RendererView> that has information on the scaling mode and whether the video is mirrored (so you could apply your own mirror transform), as well as the target DOM element, that we can append to the children of the DOM element captured via the vidRef ref. You’ll notice that I’m doing !. before appendChild, and this is to trick the TypeScript compiler, as it doesn’t properly understand the useRef assignment. Yes, it’s true that the vidRef could be null (its default value), but that’d require the hooks and Promise to execute synchronously, which isn’t possible, so we can override the type check using the ! postfix assertion.

Changing Camera Feeds

It’s possible that someone has multiple cameras on their machine and they want to switch between them, how would you go about doing that?

The first thought might be that we create a new LocalVideoStream and Renderer, but it’s actually a lot simpler than that as the LocalVideoStream provides a switchSource method that will change the underlying camera source and in turn cascade that across to the Renderer.

We’ll update our context with that support:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
useEffect(() => {
    if (currentCamera && !videoStream) {
        const lvs = new LocalVideoStream(currentCamera);
        setVidStream(lvs);
    } else if (
        currentCamera &&
        videoStream &&
        videoStream.getSource() !== currentCamera
    ) {
        videoStream.switchSource(currentCamera);
    }
}, [currentCamera, videoStream]);

This new conditional branch will make sure we have a camera, video stream and the selected camera isn’t already set (this was a side effect of React hooks and not something you’d necessarily need to do), and that’s all we need for switching, we don’t need to touch our Renderer at all.

Conclusion

There we have it, we’re now displaying the camera feed and you can see yourself. The use of the LocalVideoStream and Renderer from the ACS SDK makes it a lot simpler to handle the events and life cycle of the objects we need to work with.

If you want to see the full code from the sample application we’re building, you’ll find it on my GitHub.

If you want to catch up on the whole episode, as well as look at how we integrate this into the overall React application, you can catch the recording on YouTube, along with the full playlist