Proposed Pull Request Change

title description author ms.author ms.service ms.topic ms.date ms.custom
Query Avro data by using Azure Data Lake Analytics Use message body properties to route device telemetry to Blob storage and query the Avro format data that's written to Blob storage. cwatson-cat cwatson azure-iot-hub how-to 08/13/2025 sfi-image-nochange
📄 Document Links
GitHub View on GitHub Microsoft Learn View on Microsoft Learn
Content Truncation Detected
The generated rewrite appears to be incomplete.
Original lines: -
Output lines: -
Ratio: -
Raw New Markdown
Generating updated version of doc...
Rendered New Markdown
Generating updated version of doc...
+0 -0
+0 -0
--- title: Query Avro data by using Azure Data Lake Analytics description: Use message body properties to route device telemetry to Blob storage and query the Avro format data that's written to Blob storage. author: cwatson-cat ms.author: cwatson ms.service: azure-iot-hub ms.topic: how-to ms.date: 08/13/2025 ms.custom: sfi-image-nochange --- # Query Avro data by using Azure Data Lake Analytics This article discusses how to query Avro data to efficiently route messages from Azure IoT Hub to Azure services. [Message routing](iot-hub-devguide-messages-d2c.md) allows you to filter data using rich queries based on message properties, message body, device twin tags, and device twin properties. To learn more about the querying capabilities in Message Routing, see [IoT Hub message routing query syntax](iot-hub-devguide-routing-query-syntax.md). The challenge is that when Azure IoT Hub routes messages to Azure Blob storage, by default IoT Hub writes the content in Avro format, which has both a message body property and a message property. The Avro format isn't used for any other endpoints. Although the Avro format is great for data and message preservation, it's a challenge to use it to query data. In comparison, JSON or CSV format is easier for querying data. IoT Hub now supports writing data to Blob storage in JSON and AVRO. For more information, see [Azure Storage as a routing endpoint](iot-hub-devguide-endpoints.md#azure-storage-as-a-routing-endpoint). To address nonrelational big-data needs and formats and overcome this challenge, you can use many of the big-data patterns for both transforming and scaling data. One of the patterns, "pay per query," is Azure Data Lake Analytics, which is the focus of this article. Although you can easily execute the query in Hadoop or other solutions, Data Lake Analytics is often better suited for this "pay per query" approach. There's an "extractor" for Avro in U-SQL. For more information, see [U-SQL Avro example](https://github.com/Azure/usql/tree/master/Examples/AvroExamples). ## Query and export Avro data to a CSV file In this section, you query Avro data and export it to a CSV file in Azure Blob storage, although you could easily place the data in other repositories or data stores. 1. Set up Azure IoT Hub to route data to an Azure Blob storage endpoint by using a property in the message body to select messages. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-1a.png" alt-text="Screen capture showing the Custom endpoints tab from the Message Routing working pane for an IoT hub in the Azure portal, highlighting the Blob storage section."::: :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-1b.png" alt-text="Screen capture showing the routes tab from the Message Routing working pane for an IoT hub in the Azure portal, highlighting the routing query and endpoint for a route."::: For more information on settings up routes and custom endpoints, see [Create and delete routes and endpoints by using the Azure portal](how-to-routing-portal.md). 2. Ensure that your device has the encoding, content type, and needed data in either the properties or the message body, as referenced in the product documentation. When you view these attributes in Device Explorer, as shown here, you can verify that they're set correctly. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-2.png" alt-text="Screenshot of the Data tab from Device Explorer, highlighting the content-type and content-encoding attributes of a message."::: 3. Set up an Azure Data Lake Store instance and a Data Lake Analytics instance. Azure IoT Hub doesn't route to a Data Lake Store instance, but a Data Lake Analytics instance requires one. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-3.png" alt-text="Screen capture of the working pane for a resource group in the Azure portal, highlighting a Data Lake Analytics instance and a Data Lake Store instance."::: 4. In Data Lake Analytics, configure Azure Blob storage as an additional store, the same Blob storage that Azure IoT Hub routes data to. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-4.png" alt-text="Screenshot of the Data sources pane from Data Lake Analytics, highlighting an Azure Storage instance as an additional data source."::: 5. As discussed in the [U-SQL Avro example](https://github.com/Azure/usql/tree/master/Examples/AvroExamples), you need four DLL files. Upload these files to a location in your Data Lake Store instance. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-5.png" alt-text="Screenshot of the Data explorer pane from Data Lake Store, highlighting four uploaded DLL files."::: 6. In Visual Studio, create a U-SQL project. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-6.png" alt-text="Screenshot of the New Project dialog from Visual Studio, highlighting the U-SQL Project template."::: 7. Paste the content of the following script into the newly created file. Modify the three highlighted sections: your Data Lake Analytics account, the associated DLL file paths, and the correct path for your storage account. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-7a.png" alt-text="Screen capture of the U-SQL editor in the working pane of Visual Studio for a U-SQL script, highlighting the three sections to be modified."::: The actual U-SQL script for simple output to a CSV file: ```sql DROP ASSEMBLY IF EXISTS [Avro]; CREATE ASSEMBLY [Avro] FROM @"/Assemblies/Avro/Avro.dll"; DROP ASSEMBLY IF EXISTS [Microsoft.Analytics.Samples.Formats]; CREATE ASSEMBLY [Microsoft.Analytics.Samples.Formats] FROM @"/Assemblies/Avro/Microsoft.Analytics.Samples.Formats.dll"; DROP ASSEMBLY IF EXISTS [Newtonsoft.Json]; CREATE ASSEMBLY [Newtonsoft.Json] FROM @"/Assemblies/Avro/Newtonsoft.Json.dll"; DROP ASSEMBLY IF EXISTS [log4net]; CREATE ASSEMBLY [log4net] FROM @"/Assemblies/Avro/log4net.dll"; REFERENCE ASSEMBLY [Newtonsoft.Json]; REFERENCE ASSEMBLY [log4net]; REFERENCE ASSEMBLY [Avro]; REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; // Blob container storage account filenames, with any path DECLARE @input_file string = @"wasb://hottubrawdata@kevinsayazstorage/kevinsayIoT/{*}/{*}/{*}/{*}/{*}/{*}"; DECLARE @output_file string = @"/output/output.csv"; @rs = EXTRACT EnqueuedTimeUtc string, Body byte[] FROM @input_file USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@" { ""type"":""record"", ""name"":""Message"", ""namespace"":""Microsoft.Azure.Devices"", ""fields"": [{ ""name"":""EnqueuedTimeUtc"", ""type"":""string"" }, { ""name"":""Properties"", ""type"": { ""type"":""map"", ""values"":""string"" } }, { ""name"":""SystemProperties"", ""type"": { ""type"":""map"", ""values"":""string"" } }, { ""name"":""Body"", ""type"":[""null"",""bytes""] }] }" ); @cnt = SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage FROM @rs; OUTPUT @cnt TO @output_file USING Outputters.Text(); ``` It took Data Lake Analytics five minutes to run the following script, which was limited to 10 analytic units and processed 177 files. The result is shown in the CSV-file output that's displayed in the following image: :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-7b.png" alt-text="Screen capture of the Job View tab in the working pane of Visual Studio for the submitted U-SQL script, showing the job graph."::: :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-7c.png" alt-text="Screen capture of the File Preview tab in the working pane of Visual Studio for the submitted U-SQL script, showing the output converted into rows in a comma-separated values (.csv) file."::: To parse the JSON, continue to step 8. 8. Most IoT messages are in JSON file format. By adding the following lines, you can parse the message into a JSON file, which lets you add the WHERE clauses and output only the needed data. ```sql @jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs; /* @cnt = SELECT EnqueuedTimeUtc AS time, Encoding.UTF8.GetString(Body) AS jsonmessage FROM @rs; OUTPUT @cnt TO @output_file USING Outputters.Text(); */ @cnt = SELECT message["message"] AS iotmessage, message["event"] AS msgevent, message["object"] AS msgobject, message["status"] AS msgstatus, message["host"] AS msghost FROM @jsonify; OUTPUT @cnt TO @output_file USING Outputters.Text(); ``` The output displays a column for each item in the `SELECT` command. :::image type="content" source="./media/iot-hub-query-avro-data/query-avro-data-8.png" alt-text="Screen capture of the File Preview tab in the working pane of Visual Studio for the submitted U-SQL script, showing the queried output of a JSON file converted into a comma-separated values (.csv) file."::: ## Next steps In this tutorial, you learned how to query Avro data to efficiently route messages from Azure IoT Hub to Azure services. * To learn more about message routing in IoT Hub, see [Use IoT Hub message routing to send device-to-cloud messages to Azure services](iot-hub-devguide-messages-d2c.md). * To learn more about routing query syntax, see [IoT Hub message routing query syntax](iot-hub-devguide-routing-query-syntax.md).
Success! Branch created successfully. Create Pull Request on GitHub
Error: