Proposed Pull Request Change

title description ms.service ms.topic author ms.author ms.reviewer ms.date
Unable to access Data Lake storage files in Azure HDInsight Unable to access Data Lake storage files in Azure HDInsight azure-hdinsight troubleshooting hareshg hgowrisankar nijelsf 09/06/2024
📄 Document Links
GitHub View on GitHub Microsoft Learn View on Microsoft Learn
Content Truncation Detected
The generated rewrite appears to be incomplete.
Original lines: -
Output lines: -
Ratio: -
Raw New Markdown
Generating updated version of doc...
Rendered New Markdown
Generating updated version of doc...
+0 -0
+0 -0
--- title: Unable to access Data Lake storage files in Azure HDInsight description: Unable to access Data Lake storage files in Azure HDInsight ms.service: azure-hdinsight ms.topic: troubleshooting author: hareshg ms.author: hgowrisankar ms.reviewer: nijelsf ms.date: 09/06/2024 --- # Unable to access Data Lake storage files in Azure HDInsight This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters. ## Issue: ACL verification failed You receive an error message similar to: ``` LISTSTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). ``` ### Cause The user might have revoked permissions of service principal(SP) on files/folders. ### Resolution 1. Check that the SP has 'x' permissions to traverse along the path. For more information, see [Permissions](https://hdinsight.github.io/ClusterCRUD/ADLS/adls-create-permission-setup.html). Sample `dfs` command to check access to files/folders in Data Lake storage account: ``` hdfs dfs -ls /<path to check access> ``` 1. Set up required permissions to access the path based on the read/write operation being performed. See here for permissions required for various file system operations. --- ## Issue: Service principal certificate expiry You receive an error message similar to: ``` Token Refresh failed - Received invalid http response: 500 ``` ### Cause The certificate provided for Service principal access might have expired. 1. SSH into headnode. Check access to storage account using following `dfs` command: ``` hdfs dfs -ls / ``` 1. Confirm that the error message is similar to the following output: ``` {"stderr": "-ls: Token Refresh failed - Received invalid http response: 500, text = Response{protocol=http/1.1, code=500, message=Internal Server Error, url=http://gw0-abccluster.24ajrd4341lebfgq5unsrzq0ue.fx.internal.cloudapp.net:909/api/oauthtoken}}... ``` 1. Get one of the urls from `core-site.xml property` - `fs.azure.datalake.token.provider.service.urls`. 1. Run the following curl command to retrieve OAuth token. ``` curl gw0-abccluster.24ajrd4341lebfgq5unsrzq0ue.fx.internal.cloudapp.net:909/api/oauthtoken ``` 1. The output for a valid service principal should be something like: ``` {"AccessToken":"MIIGHQYJKoZIhvcNAQcDoIIGDjCCBgoCAQA…….","ExpiresOn":1500447750098} ``` 1. If the service principal certificate has expired, the output will look something like this: ``` Exception in OAuthTokenController.GetOAuthToken: 'System.InvalidOperationException: Error while getting the OAuth token from AAD for AppPrincipalId aaaaaaaa-bbbb-cccc-1111-222222222222, ResourceUri https://management.core.windows.net/, AADTenantId https://login.windows.net/80abc8bf-86f1-41af-91ab-2d7cd011db47, ClientCertificateThumbprint C49C25705D60569884EDC91986CEF8A01A495783 ---> Microsoft.IdentityModel.Clients.ActiveDirectory.AdalServiceException: AADSTS70002: Error validating credentials. AADSTS50012: Client assertion contains an invalid signature. **[Reason - The key used is expired.**, Thumbprint of key used by client: 'C49C25705D60569884EDC91986CEF8A01A495783', Found key 'Start=08/03/2016, End=08/03/2017, Thumbprint=C39C25705D60569884EDC91986CEF8A01A4956D1', Configured keys: [Key0:Start=08/03/2016, End=08/03/2017, Thumbprint=C39C25705D60569884EDC91986CEF8A01A4956D1;]] Trace ID: 0000aaaa-11bb-cccc-dd22-eeeeee333333 Correlation ID: aaaa0000-bb11-2222-33cc-444444dddddd Timestamp: 2017-10-06 20:44:56Z ---> System.Net.WebException: The remote server returned an error: (401) Unauthorized. at System.Net.HttpWebRequest.GetResponse() at Microsoft.IdentityModel.Clients.ActiveDirectory.HttpWebRequestWrapper.<GetResponseSyncOrAsync>d__2.MoveNext() ``` 1. Any other Microsoft Entra related errors/certificate related errors can be recognized by pinging the gateway url to get the OAuth token. 1. If you are getting following error when attempting to access ADLS from the HDI Cluster. Check if the Certificate has Expired by following the steps mentioned above. ``` Error: java.lang.IllegalArgumentException: Token Refresh failed - Received invalid http response: 500, text = Response{protocol=http/1.1, code=500, message=Internal Server Error, url=http://clustername.hmssomerandomstringc.cx.internal.cloudapp.net:909/api/oauthtoken} ``` ### Resolution Create a new Certificate or assign existing Certificate using the following PowerShell script: ```powershell $clusterName = 'CLUSTERNAME' $resourceGroupName = 'RGNAME' $subscriptionId = 'SUBSCRIPTIONID' $appId = 'APPLICATIONID' $generateSelfSignedCert = $false $addNewCertKeyCredential = $true $certFilePath = 'NEW_CERT_PFX_LOCAL_PATH' $certPassword = Read-Host "Enter Certificate Password" if($generateSelfSignedCert) { Write-Host "Generating new SelfSigned certificate" $cert = New-SelfSignedCertificate -CertStoreLocation "cert:\CurrentUser\My" -Subject "CN=hdinsightAdlsCert" -KeySpec KeyExchange $certBytes = $cert.Export([System.Security.Cryptography.X509Certificates.X509ContentType]::Pkcs12, $certPassword); $certString = [System.Convert]::ToBase64String($certBytes) } else { Write-Host "Reading the cert file from path $certFilePath" $cert = new-object System.Security.Cryptography.X509Certificates.X509Certificate2($certFilePath, $certPassword) $certString = [System.Convert]::ToBase64String([System.IO.File]::ReadAllBytes($certFilePath)) } Login-AzureRmAccount if($addNewCertKeyCredential) { Write-Host "Creating new KeyCredential for the app" $keyValue = [System.Convert]::ToBase64String($cert.GetRawCertData()) New-AzureRmADAppCredential -ApplicationId $appId -CertValue $keyValue -EndDate $cert.NotAfter -StartDate $cert.NotBefore Write-Host "Waiting for 30 seconds for the permissions to get propagated" Start-Sleep -s 30 } Select-AzureRmSubscription -SubscriptionId $subscriptionId Write-Host "Updating the certificate on HDInsight cluster." Invoke-AzureRmResourceAction ` -ResourceGroupName $resourceGroupName ` -ResourceType 'Microsoft.HDInsight/clusters' ` -ResourceName $clusterName ` -ApiVersion '2015-03-01-preview' ` -Action 'updateclusteridentitycertificate' ` -Parameters @{ ApplicationId = $appId.ToString(); Certificate = $certString; CertificatePassword = $certPassword.ToString() } ` -Force ``` For assigning existing certificate, create a certificate, have the .pfx file and password ready. Associate the certificate with the service principal that the cluster was created with, using the AppId ready. Execute the PowerShell command after you substitute the parameters with the actual values. ## Next steps [!INCLUDE [troubleshooting next steps](../includes/hdinsight-troubleshooting-next-steps.md)]
Success! Branch created successfully. Create Pull Request on GitHub
Error: