refactor: flatten Microsoft skills from nested to flat directory structure

Rewrote sync_microsoft_skills.py (v4) to use each SKILL.md's frontmatter 'name' field as the flat directory name under skills/, replacing the nested skills/official/microsoft/<lang>/<category>/<service>/ hierarchy. This fixes CI failures caused by the indexing, validation, and catalog scripts expecting skills/<id>/SKILL.md (depth 1). Changes: - Rewrite scripts/sync_microsoft_skills.py for flat output with collision detection - Update scripts/tests/inspect_microsoft_repo.py for flat name mapping - Update scripts/tests/test_comprehensive_coverage.py for name uniqueness checks - Delete skills/official/ nested directory - Add 129 Microsoft skills as flat directories (e.g. skills/azure-mgmt-botservice-dotnet/) - Move attribution files to docs/ (LICENSE-MICROSOFT, microsoft-skills-attribution.json) - Rebuild skills_index.json, CATALOG.md, README.md (845 total skills)
2026-02-12 00:07:15 +05:00
parent e06454dafd
commit e7ae616385
142 changed files with 5683 additions and 6097 deletions
--- a/skills/azure-ai-document-intelligence-ts/SKILL.md
+++ b/skills/azure-ai-document-intelligence-ts/SKILL.md
@@ -0,0 +1,323 @@
+---
+name: azure-ai-document-intelligence-ts
+description: Extract text, tables, and structured data from documents using Azure Document Intelligence (@azure-rest/ai-document-intelligence). Use when processing invoices, receipts, IDs, forms, or building custom document models.
+package: @azure-rest/ai-document-intelligence
+---
+
+# Azure Document Intelligence REST SDK for TypeScript
+
+Extract text, tables, and structured data from documents using prebuilt and custom models.
+
+## Installation
+
+```bash
+npm install @azure-rest/ai-document-intelligence @azure/identity
+```
+
+## Environment Variables
+
+```bash
+DOCUMENT_INTELLIGENCE_ENDPOINT=https://<resource>.cognitiveservices.azure.com
+DOCUMENT_INTELLIGENCE_API_KEY=<api-key>
+```
+
+## Authentication
+
+**Important**: This is a REST client. `DocumentIntelligence` is a **function**, not a class.
+
+### DefaultAzureCredential
+
+```typescript
+import DocumentIntelligence from "@azure-rest/ai-document-intelligence";
+import { DefaultAzureCredential } from "@azure/identity";
+
+const client = DocumentIntelligence(
+  process.env.DOCUMENT_INTELLIGENCE_ENDPOINT!,
+  new DefaultAzureCredential()
+);
+```
+
+### API Key
+
+```typescript
+import DocumentIntelligence from "@azure-rest/ai-document-intelligence";
+
+const client = DocumentIntelligence(
+  process.env.DOCUMENT_INTELLIGENCE_ENDPOINT!,
+  { key: process.env.DOCUMENT_INTELLIGENCE_API_KEY! }
+);
+```
+
+## Analyze Document (URL)
+
+```typescript
+import DocumentIntelligence, {
+  isUnexpected,
+  getLongRunningPoller,
+  AnalyzeOperationOutput
+} from "@azure-rest/ai-document-intelligence";
+
+const initialResponse = await client
+  .path("/documentModels/{modelId}:analyze", "prebuilt-layout")
+  .post({
+    contentType: "application/json",
+    body: {
+      urlSource: "https://example.com/document.pdf"
+    },
+    queryParameters: { locale: "en-US" }
+  });
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = (await poller.pollUntilDone()).body as AnalyzeOperationOutput;
+
+console.log("Pages:", result.analyzeResult?.pages?.length);
+console.log("Tables:", result.analyzeResult?.tables?.length);
+```
+
+## Analyze Document (Local File)
+
+```typescript
+import { readFile } from "node:fs/promises";
+
+const fileBuffer = await readFile("./document.pdf");
+const base64Source = fileBuffer.toString("base64");
+
+const initialResponse = await client
+  .path("/documentModels/{modelId}:analyze", "prebuilt-invoice")
+  .post({
+    contentType: "application/json",
+    body: { base64Source }
+  });
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = (await poller.pollUntilDone()).body as AnalyzeOperationOutput;
+```
+
+## Prebuilt Models
+
+| Model ID | Description |
+|----------|-------------|
+| `prebuilt-read` | OCR - text and language extraction |
+| `prebuilt-layout` | Text, tables, selection marks, structure |
+| `prebuilt-invoice` | Invoice fields |
+| `prebuilt-receipt` | Receipt fields |
+| `prebuilt-idDocument` | ID document fields |
+| `prebuilt-tax.us.w2` | W-2 tax form fields |
+| `prebuilt-healthInsuranceCard.us` | Health insurance card fields |
+| `prebuilt-contract` | Contract fields |
+| `prebuilt-bankStatement.us` | Bank statement fields |
+
+## Extract Invoice Fields
+
+```typescript
+const initialResponse = await client
+  .path("/documentModels/{modelId}:analyze", "prebuilt-invoice")
+  .post({
+    contentType: "application/json",
+    body: { urlSource: invoiceUrl }
+  });
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = (await poller.pollUntilDone()).body as AnalyzeOperationOutput;
+
+const invoice = result.analyzeResult?.documents?.[0];
+if (invoice) {
+  console.log("Vendor:", invoice.fields?.VendorName?.content);
+  console.log("Total:", invoice.fields?.InvoiceTotal?.content);
+  console.log("Due Date:", invoice.fields?.DueDate?.content);
+}
+```
+
+## Extract Receipt Fields
+
+```typescript
+const initialResponse = await client
+  .path("/documentModels/{modelId}:analyze", "prebuilt-receipt")
+  .post({
+    contentType: "application/json",
+    body: { urlSource: receiptUrl }
+  });
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = (await poller.pollUntilDone()).body as AnalyzeOperationOutput;
+
+const receipt = result.analyzeResult?.documents?.[0];
+if (receipt) {
+  console.log("Merchant:", receipt.fields?.MerchantName?.content);
+  console.log("Total:", receipt.fields?.Total?.content);
+  
+  for (const item of receipt.fields?.Items?.values || []) {
+    console.log("Item:", item.properties?.Description?.content);
+    console.log("Price:", item.properties?.TotalPrice?.content);
+  }
+}
+```
+
+## List Document Models
+
+```typescript
+import DocumentIntelligence, { isUnexpected, paginate } from "@azure-rest/ai-document-intelligence";
+
+const response = await client.path("/documentModels").get();
+
+if (isUnexpected(response)) {
+  throw response.body.error;
+}
+
+for await (const model of paginate(client, response)) {
+  console.log(model.modelId);
+}
+```
+
+## Build Custom Model
+
+```typescript
+const initialResponse = await client.path("/documentModels:build").post({
+  body: {
+    modelId: "my-custom-model",
+    description: "Custom model for purchase orders",
+    buildMode: "template",  // or "neural"
+    azureBlobSource: {
+      containerUrl: process.env.TRAINING_CONTAINER_SAS_URL!,
+      prefix: "training-data/"
+    }
+  }
+});
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = await poller.pollUntilDone();
+console.log("Model built:", result.body);
+```
+
+## Build Document Classifier
+
+```typescript
+import { DocumentClassifierBuildOperationDetailsOutput } from "@azure-rest/ai-document-intelligence";
+
+const containerSasUrl = process.env.TRAINING_CONTAINER_SAS_URL!;
+
+const initialResponse = await client.path("/documentClassifiers:build").post({
+  body: {
+    classifierId: "my-classifier",
+    description: "Invoice vs Receipt classifier",
+    docTypes: {
+      invoices: {
+        azureBlobSource: { containerUrl: containerSasUrl, prefix: "invoices/" }
+      },
+      receipts: {
+        azureBlobSource: { containerUrl: containerSasUrl, prefix: "receipts/" }
+      }
+    }
+  }
+});
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = (await poller.pollUntilDone()).body as DocumentClassifierBuildOperationDetailsOutput;
+console.log("Classifier:", result.result?.classifierId);
+```
+
+## Classify Document
+
+```typescript
+const initialResponse = await client
+  .path("/documentClassifiers/{classifierId}:analyze", "my-classifier")
+  .post({
+    contentType: "application/json",
+    body: { urlSource: documentUrl },
+    queryParameters: { split: "auto" }
+  });
+
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+const poller = getLongRunningPoller(client, initialResponse);
+const result = await poller.pollUntilDone();
+console.log("Classification:", result.body.analyzeResult?.documents);
+```
+
+## Get Service Info
+
+```typescript
+const response = await client.path("/info").get();
+
+if (isUnexpected(response)) {
+  throw response.body.error;
+}
+
+console.log("Custom model limit:", response.body.customDocumentModels.limit);
+console.log("Custom model count:", response.body.customDocumentModels.count);
+```
+
+## Polling Pattern
+
+```typescript
+import DocumentIntelligence, {
+  isUnexpected,
+  getLongRunningPoller,
+  AnalyzeOperationOutput
+} from "@azure-rest/ai-document-intelligence";
+
+// 1. Start operation
+const initialResponse = await client
+  .path("/documentModels/{modelId}:analyze", "prebuilt-layout")
+  .post({ contentType: "application/json", body: { urlSource } });
+
+// 2. Check for errors
+if (isUnexpected(initialResponse)) {
+  throw initialResponse.body.error;
+}
+
+// 3. Create poller
+const poller = getLongRunningPoller(client, initialResponse);
+
+// 4. Optional: Monitor progress
+poller.onProgress((state) => {
+  console.log("Status:", state.status);
+});
+
+// 5. Wait for completion
+const result = (await poller.pollUntilDone()).body as AnalyzeOperationOutput;
+```
+
+## Key Types
+
+```typescript
+import DocumentIntelligence, {
+  isUnexpected,
+  getLongRunningPoller,
+  paginate,
+  parseResultIdFromResponse,
+  AnalyzeOperationOutput,
+  DocumentClassifierBuildOperationDetailsOutput
+} from "@azure-rest/ai-document-intelligence";
+```
+
+## Best Practices
+
+1. **Use getLongRunningPoller()** - Document analysis is async, always poll for results
+2. **Check isUnexpected()** - Type guard for proper error handling
+3. **Choose the right model** - Use prebuilt models when possible, custom for specialized docs
+4. **Handle confidence scores** - Fields have confidence values, set thresholds for your use case
+5. **Use pagination** - Use `paginate()` helper for listing models
+6. **Prefer neural mode** - For custom models, neural handles more variation than template