Recognize Text With ML Kit in Jetpack Compose

Introduction

In the world of Modern Android development, the Application has become faster and smarter through the integration of machine learning (ML) capabilities. This integration not only enhances user experiences but also brings automation and intelligence to various tasks. One such example is text recognition, a technology that allows applications to convert images containing text into machine-readable text data. In this article, we will explore how to recognize text using ML Kit within the Jetpack Compose framework, a powerful UI toolkit for building native Android applications.

Recognize text with ML Kit

Understanding ML Kit

When we face tough jobs like changing words from one language to another, finding faces, or recognizing voices on Android devices, we can make things simpler by using special computer programs called machine learning models. These models are a subset of a bigger thing called AI.

To bring this machine-learning magic to Mobile devices, we use something called the ML Kit. Think of it as a special toolkit for phones. It helps apps talk to the machine-learning models and brings Google's smart machine-learning skills to both Android and iOS apps. It's like a bridge connecting the app and the clever models that understand things like languages and faces.

Adding ML Kit to a mobile device is quite simple. ML Kit offers a handy way to tackle challenging tasks like spotting objects, understanding gestures, reading text, classifying sounds, recognizing speech, suggesting words, giving clever responses, and much more. We can even use third-party ML Kits like TensorFlow Lite or build our very own machine-learning models. But for now, let's stick with Google's ML Kit and create an app that recognizes text.

Text Recognition with ML Kit

Extracting text from photographs is a component of text recognition, a subset of optical character recognition (OCR). This technology has a variety of uses, including document scanning, text translation, data extraction from business cards, and more. The text recognition model in ML Kit is proficient with a variety of languages and fonts, making it a useful tool for programmers.

Integrating ML Kit Text Recognition with Jetpack Compose

To integrate ML Kit's text recognition capabilities with Jetpack Compose, follow these steps:

Step 1. Add Dependencies

Add the necessary dependencies to your app's build.gradle file.

    // Text features
    implementation 'com.google.android.gms:play-services-mlkit-text-recognition:19.0.0'
    implementation("io.coil-kt:coil-compose:2.3.0")
    implementation "androidx.lifecycle:lifecycle-viewmodel-compose:2.6.1"

Step 2. Set Up the Photo Picker

In order to simplify this application, we'll establish a photo picker. Of course, you have the option to use the camera to select the photo.

For now, let's configure the photo picker using the following code.

    var imageUri: Any? by remember { mutableStateOf(R.drawable.img) }
    
    val photoPicker = rememberLauncherForActivityResult(
        contract = ActivityResultContracts.PickVisualMedia()
    ) {
        if (it != null) {
            Log.d("PhotoPicker", "Selected URI: $it")
            imageUri = it
        } else {
            Log.d("PhotoPicker", "No media selected")
        }
    }


    Column(
        modifier = Modifier.fillMaxSize(),
        horizontalAlignment = Alignment.CenterHorizontally,
        verticalArrangement = Arrangement.Center
    ) {
        AsyncImage(
            modifier = Modifier
                .size(250.dp)
                .clickable {
                    photoPicker.launch(
                        PickVisualMediaRequest(
                            ActivityResultContracts.PickVisualMedia.ImageOnly
                        )
                    )
                },
            model = ImageRequest.Builder(LocalContext.current).data(imageUri)
                .crossfade(enable = true).build(),
            contentDescription = "Avatar Image",
            contentScale = ContentScale.Crop,
        )
        
        Spacer(modifier = Modifier.height(24.dp))

        // Coming Up Next 
    }

If you're looking for more details on how we're configuring this setup, I've already written an article on the process of selecting images from the gallery without needing special permissions. Feel free to check out the article by clicking here.

Step 3. Perform Text Recognition

To monitor the data, we'll generate a data class called MainScreenState. This class will be stored within our view model to endure configuration changes. The structure of the class will resemble the following.

data class MainScreenState(
    val extractedText :String = "Not detected yet..",
    val isButtonEnabled:Boolean = true
)

Next, we'll generate a view model responsible for housing all the text recognition logic. The methods within this view model will be triggered upon clicking the image composable.

    fun onTranslateButtonClick(
        text: Any?, context: Context
    ) {
        var image: InputImage? = null
        try {
            image = InputImage.fromFilePath(context, text as Uri)
        } catch (e: IOException) {
            e.printStackTrace()
        }
        val recognizer = image?.let {
            TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS).process(it)
                .addOnSuccessListener(OnSuccessListener { text ->

                    onTextToBeTranslatedChange(text.text)

                }).addOnFailureListener(OnFailureListener {
                    Toast.makeText(
                        context, "Some error caught", Toast.LENGTH_SHORT
                    ).show()
                })
        }
    }

TextRecognition takes InputImage as a parameter. For this, we will convert our image URI to InputImage and add this inside a try-catch block. Now we process TextRecognition and add a Success and Failure listener.

Step 4. Display Recognized Text

To display the process text, we will create a function that will trigger the state of the app like the below.

    fun onTextToBeTranslatedChange(text: String) {
        _state.value = state.value.copy(
            extractedText = text
        )
    }

To set this to screen, we will use the state of the ViewModel and display it in a row composable like here.

@Composable
fun ImagePickerScreen(
    viewModel: MainViewModel = androidx.lifecycle.viewmodel.compose.viewModel()
) {    
    val state = viewModel.state.value
    val context = LocalContext.current

    // photo picker code


    Column(
        modifier = Modifier.fillMaxSize(),
        horizontalAlignment = Alignment.CenterHorizontally,
        verticalArrangement = Arrangement.Center
    ) {
        //existing code 
        
        val scrollState = rememberScrollState()
        Row(modifier = Modifier.fillMaxWidth().verticalScroll(scrollState)) {
            Text(
                text = state.extractedText,
                textAlign = TextAlign.Center,
                modifier = Modifier.fillMaxWidth()
            )
        }
    }
}    

To check out the application, download the zip file link in the article.

Output

Conclusion

The functionality and user experience of your Android app can be greatly improved by integrating text recognition features utilizing ML Kit within Jetpack Compose applications. Developers can easily construct applications that extract and display text from photos using the potent combination of ML Kit's pre-trained models and Jetpack Compose's declarative UI approach, opening up opportunities for creative and intelligent solutions across multiple domains. Keep in mind that for more detailed instructions and best practices, consult the official documentation of both ML Kit and Jetpack Compose.


Similar Articles