How to develop a video editing tool similar to Silhouette or TikTok?
The development tasks are as shown above. The primary task of a short video production app is to implement a highly interactive player in real time and support multiple editing capabilities during playback preview.
Initially, we investigated a variety of solutions. At first glance, it seemed that the Android native player was definitely not enough. We probably had to look for reference solutions among the many C open source players. It would be best to implement a player yourself, which is highly flexible. Highly controllable. However, we discovered the power of the boy band player EXO. Although this player is so commonly used, we did not know that its potential was so huge and could be expanded to be so powerful.
In fact, until now, we still use exoplayer for editing preview in our self-developed video editing tool. Why choose exoplayer, based on the following reasons (in a word, high cost performance):
Use the exoplayer player for secondary development to quickly and efficiently implement the video editing function. The video clip player is used for real-time preview playback during the video editing process. The supported functions are:
For the functions that need to be supported by the above video clips, refer to the API documentation of the explayer one by one to find ways to expand the implementation.
Among them, video rotation, text stickers, beauty filters, and material transitions require calling setVideoSurface to control the video presentation layer, customizing GLSurfaceView, and using opengl to implement video rotation, beauty filters, and adding stickers. . The surface played by exoplayer is bound to the rendering texture of the custom GLSurfaceView.
For video clipping and playback, use ClippingMediaSource to set the clipping material, and input the start time and end time according to the api document.
For multiple video splicing and playback, ConcatenatingMediaSource can be used to seamlessly merge and play multiple materials. In order to edit a single material, isAtomic is set to true.
To change the speed, use setPlaybackParameters to set the speed parameters
These three functions can be implemented using the API provided by exoplayer, which is relatively easy. Just update the player materials and parameters immediately after performing the editing operation. In our product, there is an interaction to undo the operation, so a copy of the data needs to be kept. If the user undoes the operation, it will be updated to the original data.
exoplayer itself does not support the playback of material in image format. Inject a custom renderer to realize images (formats are jpg, png, gif, etc.)
Among them, ImageRender inherits BaseRenderer and implements custom rendering of images. The main job of render is to render each frame of data decoding stream into a screen image. For pictures, we define ImageMediaSourceImage, SampleStreamImpl and ImageMediaPeriod, which inherit from BaseMediaSource, SampleStream and MediaPeriod respectively, to parse and transmit each frame of picture data from the original material. The picture does not need to be actually decoded. Implement the readData method of SampleStream to read the picture uri into the decoding buffer.
The core of realizing image playback is to implement the render interface:
In this method, we create an opengl environment and draw the bitmap to the screen
Added text Or sticker supports moving, rotating, scaling and setting timeline.
For multiple text stickers, we finally package them into a bitmap with the same size as the rendering screen, and draw a series of small bitmaps with coordinate sizes and start and end times on the bitmap canvas (i.e. stickerItem.getBitmap).
By mixing this sticker canvas bitmap with the original video frame pixels, all text stickers can be drawn. Using opengl to draw stickers is to perform a watermark filter operation on the pixels on the screen. Use GLSL's built-in mix function to mix two textures. The following is the fragment shader used by the watermark filter.
Like text stickers, to achieve real-time beauty filter effects, you must use frame buffer fbo. Each storage unit of the frame buffer corresponds to each pixel of the screen. The beauty filter involves a more complex algorithm, and the artificial intelligence group within the department provides SDK access. The method of calling the SDK during the drawing process is as follows, which is to use fbo to perform an image texture conversion. The incoming parameters are screen orientation, camera orientation and rendering size.
At present, the product realizes several transition effects such as moving left and right, moving up and down, zooming in and out, and rotating clockwise and counterclockwise. The implementation method of transition is: for two materials with transitions added to them, draw a transition filter in the last 1000ms of the previous material. The transition filter renders the pixels of the two pictures according to a certain rule. The field algorithm is implemented by OpenGL using the glsl shader. The fragment shader of the transition base class is as follows. Moving transitions (moving left and right and moving up and down), scaling transitions (zooming in and out), and rotation transitions have different behaviors on getFromColor and getToColor.
Take the transition glsl shader of mobile transition as an example
The specific implementation of the transition refers to the GPUImageFilter library. Unlike the beauty filter and text sticker, the transition filter The mirror needs to be pre-set to the first frame of the next material before rendering.
During the preview editing process, since the music does not need to be actually synthesized in the video, you can use another player to play the audio separately. We use Android's more original MediaPlayer to play the music separately, which supports music alone. Crop play and seek.
The frame preview is to take a frame of the video at a fixed time to form a timeline. We use the ffmpegMediaMetadataRetriever library to extract frames. The usage method is
The library uses ffmpeg internally for decoding. The interface for fetching frames is easy to use, but its software decoding method is too inefficient and relatively slow. Because the exoplayer player uses hardware decoding by default, you can use another exoplayer player to quickly play the material once, and then obtain screen images at regular intervals. However, this method is too expensive and two exoplayer players are not conducive to management.
Finally, we found that the commonly used image loading library glide can also extract video frames, which is simpler and more convenient to use. It uses mediaMetadataRetriever internally to extract frames.
1. Adjust materials, splice, crop, change speed
/file/5f896ef25655da63cc2d3237.mp4
2. Transitions, text stickers, beauty filters
/file/5f896edad70f81a0e3c77dbe.mp4