This page contains Windows bias

About This Page

This page is part of the Azure documentation. It contains code examples and configuration instructions for working with Azure services.

Bias Analysis

Bias Types:
⚠️ windows_first
⚠️ windows_tools
⚠️ powershell_heavy
⚠️ missing_linux_example
Summary:
The documentation exhibits a strong Windows bias. All file copy examples use Windows tools (Robocopy, File Explorer), and the Data Box Disk Split Copy and Validation tools are only available for Windows. There are no Linux-specific examples or equivalent Linux tools mentioned for copying or validating data. Linux users are only briefly referenced, with no guidance or command-line examples provided for Linux environments.
Recommendations:
  • Provide equivalent Linux command-line examples for copying data to the Data Box Disk, such as using cp, rsync, or smbclient.
  • List Linux-compatible tools for checksum validation (e.g., sha256sum, md5sum) and provide example commands for validating data integrity.
  • Explicitly mention any limitations or differences for Linux users at the start of relevant sections, and offer alternative workflows where Windows-only tools are referenced.
  • Include troubleshooting tips and best practices for Linux environments, such as handling long paths, file permissions, and mounting NTFS disks.
  • Where screenshots or walkthroughs are provided, include at least one example from a Linux desktop or terminal.
GitHub Create pull request

Scan History

Date Scan ID Status Bias Status
2025-08-17 00:01 #83 in_progress ✅ Clean
2025-07-13 21:37 #48 completed ❌ Biased
2025-07-09 13:09 #3 cancelled ✅ Clean
2025-07-08 04:23 #2 cancelled ❌ Biased

Flagged Code Snippets

To optimize the performance, use the following robocopy parameters when copying the data. | Platform | Mostly small files < 512 KB | Mostly medium files 512 KB-1 MB | Mostly large files > 1 MB | |---------------|-----------------------------|----------------------------------|---------------------------| | Data Box Disk | 4 Robocopy sessions*<br>16 threads per session | 2 Robocopy session*<br>16 threads per session | 2 Robocopy session*<br>16 threads per session | **Each Robocopy session can have a maximum of 7,000 directories and 150 million files.* For more information on the Robocopy command, read the [Robocopy and a few examples](https://social.technet.microsoft.com/wiki/contents/articles/1073.robocopy-and-a-few-examples.aspx) article. 1. Open the target folder, then view and verify the copied files. If you have any errors during the copy process, download the log files for troubleshooting. The robocopy command's output specifies the location of the log files. ### Split and copy data to disks The Data Box Split Copy tool helps split and copy data across two or more Azure Data Box Disks. The tool is only available for use on a Windows computer. This optional procedure is helpful when you have a large dataset that needs to be split and copied across several disks. >[!IMPORTANT] > The Data Box Split Copy tool can also validate your data. If you use Data Box Split Copy tool to copy data, you can skip the [validation step](#validate-data). > > Access tier assignment is not supported when copying data using the Data Box Split Copy Tool. If your use case requires access tier assignment, follow the steps contained within the [Copy data to disks](#copy-data-to-disks) section to copy your data to the appropriate access tier using the Robocopy utility. > > The Data Box Split Copy tool is not supported with managed disks. 1. On your Windows computer, ensure that you have the Data Box Split Copy tool downloaded and extracted in a local folder. This tool is included within the Data Box Disk toolset for Windows. 1. Open File Explorer. Make a note of the data source drive and drive letters assigned to Data Box Disk. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-1-sml.png" alt-text="Screenshot of the data source drive and drive letters assigned to Data Box Disk." lightbox="media/data-box-disk-deploy-copy-data/split-copy-1.png"::: 1. Identify the source data to copy. For instance, in this case: - The following block blob data was identified. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-2-sml.png" alt-text="Screenshot of block blob data identified for the copy process." lightbox="media/data-box-disk-deploy-copy-data/split-copy-2.png"::: - The following page blob data was identified. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-3-sml.png" alt-text="Screenshot of page blob data identified for the copy process." lightbox="media/data-box-disk-deploy-copy-data/split-copy-3.png"::: 1. Navigate to the folder where the software is extracted and locate the `SampleConfig.json` file. This file is a read-only file that you can modify and save. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-4-sml.png" alt-text="Screenshot showing the location of the sample configuration file." lightbox="media/data-box-disk-deploy-copy-data/split-copy-4.png"::: 1. Modify the `SampleConfig.json` file. - Provide a job name. A folder with this name is created on the Data Box Disk. The name is also used to create a container in the Azure storage account associated with these disks. The job name must follow the [Azure container naming conventions](/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata). - Supply a source path, making note of the path format in the `SampleConfigFile.json`. - Enter the drive letters corresponding to the target disks. Data is taken from the source path and copied across multiple disks. - Provide a path for the log files. By default, log files are sent to the directory where the `.exe` file is located. - To validate the file format, go to `JSONlint`. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-5.png" alt-text="Screenshot showing the contents of the sample configuration file."::: - Save the file as `ConfigFile.json`. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-6-sml.png" alt-text="Screenshot showing the location of the replacement configuration file." lightbox="media/data-box-disk-deploy-copy-data/split-copy-6.png"::: 1. Open a Command Prompt window with elevated privileges and run the `DataBoxDiskSplitCopy.exe` using the following command.
1. When prompted, press any key to continue running the tool. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-8-sml.png" alt-text="Screenshot showing the command prompt window executing the Split Copy tool." lightbox="media/data-box-disk-deploy-copy-data/split-copy-8.png"::: 1. After the dataset is split and copied, the summary of the Split Copy tool for the copy session is presented as shown in the following sample output. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-9-sml.png" alt-text="Screenshot showing the summary presented after successful execution of the Split Copy tool." lightbox="media/data-box-disk-deploy-copy-data/split-copy-9.png"::: 1. Verify that the data is split properly across the target disks. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-10-sml.png" alt-text="Screenshot indicating resulting data split properly across the first of two target disks." lightbox="media/data-box-disk-deploy-copy-data/split-copy-10.png"::: :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-11-sml.png" alt-text="Screenshot indicating resulting data split properly across the second of two target disks." lightbox="media/data-box-disk-deploy-copy-data/split-copy-11.png"::: Examine the `H:` drive contents and ensure that two subfolders are created that correspond to block blob and page blob format data. :::image type="content" source="media/data-box-disk-deploy-copy-data/split-copy-12-sml.png" alt-text="Screenshot showing two subfolders created which correspond to block blob and page blob format data." lightbox="media/data-box-disk-deploy-copy-data/split-copy-12.png"::: 1. If the copy session fails, use the following command to recover and resume: `DataBoxDiskSplitCopy.exe PrepImport /config:ConfigFile.json /ResumeSession` If you encounter errors while using the Split Copy tool, follow the steps within the [troubleshoot Split Copy tool errors](data-box-disk-troubleshoot-data-copy.md) article. >[!IMPORTANT] > The Data Box Split Copy tool also validates your data. If you use Data Box Split Copy tool to copy data, you can skip the [validation step](#validate-data). > The Split Copy tool is not supported with managed disks. ## Validate data If you didn't use the Data Box Split Copy tool to copy data, you need to validate your data. Verify the data by performing the following steps on each of your Data Box Disks. If you encounter errors during validation, follow the steps within the [troubleshoot validation errors](data-box-disk-troubleshoot.md) article. 1. Run `DataBoxDiskValidation.cmd` for checksum validation in the *DataBoxDiskImport* folder of your drive. This tool is only available for the Windows environment. Linux users need to validate that the source data copied to the disk meets [Azure Data Box prerequisites](./data-box-disk-limits.md). :::image type="content" source="media/data-box-disk-deploy-copy-data/validation-tool-output-sml.png" alt-text="Screenshot showing Data Box Disk validation tool output." lightbox="media/data-box-disk-deploy-copy-data/validation-tool-output.png"::: 1. Choose the appropriate validation option when prompted. **We recommend that you always validate the files and generate checksums by selecting option 2**. Exit the command window after the script completes. The time required for validation to complete depends upon the size of your data. The tool notifies you of any errors encountered during validation and checksum generation, and provides you with a link to the error logs. :::image type="content" source="media/data-box-disk-deploy-copy-data/checksum-output-sml.png" alt-text="Screenshot showing a failed execution attempt and indicating the location of the corresponding log file." lightbox="media/data-box-disk-deploy-copy-data/checksum-output.png"::: > [!TIP] > - Reset the tool between two runs. > - The checksum process may take more time if you have a large data set containing many files that take up relatively little storage capacity. If you validate files and skip checksum creation, you should independently verify data integrity on the Data Box Disk prior to deleting any copies. This verification ideally includes generating checksums. ## Next steps In this tutorial, you learned how to complete the following tasks with Azure Data Box Disk: > [!div class="checklist"] > * Copy data to Data Box Disk > * Verify data integrity Advance to the next tutorial to learn how to return the Data Box Disk and verify the data upload to Azure. > [!div class="nextstepaction"] > [Ship your Azure Data Box back to Microsoft](./data-box-disk-deploy-picked-up.md) <!--::: zone-end--> <!--::: zone target="chromeless" ### Copy data to disks Take the following steps to connect and copy data from your computer to the Data Box Disk. 1. View the contents of the unlocked drive. The list of the precreated folders and subfolders in the drive is different depending upon the options selected when placing the Data Box Disk order. 2. Copy the data to folders that correspond to the appropriate data format. For instance, copy unstructured data to the *BlockBlob* folder, VHD or VHDX data to the *PageBlob* folder, and files to *AzureFile* folder. If the data format doesn't match the appropriate folder (storage type), the data upload to Azure fails at a later step. - Make sure that all the containers, blobs, and files conform to [Azure naming conventions](data-box-disk-limits.md#azure-block-blob-page-blob-and-file-naming-conventions) and [Azure object size limits](data-box-disk-limits.md#azure-object-size-limits). If these rules or limits aren't followed, the data upload to Azure fails. - If your order has Managed Disks as one of the storage destinations, see the naming conventions for [managed disks](data-box-disk-limits.md#managed-disk-naming-conventions). - A container is created in the Azure storage account for each subfolder within the *BlockBlob* and *PageBlob* folders. All files within the *BlockBlob* and *PageBlob* folders are copied to the default *$root* container within the Azure Storage account. Any files within the *$root* container are always uploaded as block blobs. - Create a subfolder within *AzureFile* folder. This subfolder maps to a fileshare in the cloud. Copy files to the subfolder. Files copied directly to *AzureFile* folder fail and are uploaded as block blobs. - If files and folders exist in the root directory, they must be moved to a different folder before data copy can begin. 3. Use drag and drop with File Explorer or any SMB compatible file copy tool such as Robocopy to copy your data. Multiple copy jobs can be initiated using the following command: