Unraveling the Limitations of Microsoft Fabric:

Unraveling the Limitations of Microsoft Fabric:

Microsoft Fabric is a game-changing end-to-end analytics solution that has taken the data world by storm. With its unified platform, Fabric offers a suite of tools for data integration, engineering, warehousing, science, real-time analytics, and business intelligence, all seamlessly integrated with Azure OpenAI. While Fabric presents a compelling proposition for organizations seeking to streamline their data operations, it’s crucial to acknowledge its limitations. This blog post delves deep into the constraints of Microsoft Fabric, providing a comprehensive overview of its shortcomings, potential workarounds, and real-world examples.

Data Warehousing Limitations

At the heart of Fabric lies its data warehousing capabilities, which empower users to build high-performance data warehouses with ease. However, certain limitations need to be considered:

  • Geographic Restrictions: Currently, Fabric’s data warehousing capabilities are not available across multiple geographic regions. This limitation can be a significant drawback for global enterprises with data spread across different locations.  
  • Garbage Collection: Fabric does not automatically remove unused Parquet files from storage through garbage collection. This can lead to storage inefficiencies and increased costs over time.  
  • Data Type Limitation: There is no nvarchar(max) data type in the warehouse. This can be a limitation when dealing with large text fields.  
  • SQL Analytics Endpoint: The SQL analytics endpoint, which provides a read-only SQL interface to query data in the lakehouse, has limitations related to Delta Lake tables:
    • Data must be in Delta Parquet format to be automatically discovered by the SQL analytics endpoint.  
    • Tables with renamed columns are not supported in the SQL analytics endpoint.  
    • While Delta column mapping by name is supported, mapping by ID is not.  
    • Delta Lake tables created outside the /tables folder are not accessible in the SQL analytics endpoint. As a workaround, move your data to the /tables folder.  
    • Some columns present in Spark Delta tables might not be available in the SQL analytics endpoint due to data type limitations.  
    • Adding a foreign key constraint between tables in the SQL analytics endpoint prevents further schema changes, such as adding new columns.  

SQL Database Limitations

Fabric also offers a SQL database service, which provides a familiar SQL Server experience within the Fabric environment. However, this service also has its limitations:

  • Limited Feature Support: Features like Change Data Capture (CDC) and Azure Synapse Link for SQL are not supported in the SQL database service.  
  • Security: SQL Audit and Transparent Data Encryption (TDE) are not currently supported. Data is protected at rest using storage encryption with service-managed keys, but customer-managed keys are not available.  
  • Database Limits: In a trial capacity, users are limited to three databases. This restriction is lifted in other capacities.  
  • Table and Column Constraints:
    • A table’s primary key cannot be of the following data types: hierarchyid, sql_variant, or timestamp.  
    • If a table contains columns of type Large Binary Object (LOB) with a size greater than 1 MB, the data in those columns is truncated to 1 MB in Fabric OneLake.  
    • In-memory tables are not currently supported.  
    • Full-text indexing is not supported.  
    • Certain table-level Data Definition Language (DDL) operations, such as Switch/Split/Merge partition and partition compression, are not allowed.  
    • Column names cannot contain spaces or the following characters: , ; { } ( ) \n \t =.  

Notebook Limitations

Fabric notebooks provide a collaborative environment for data scientists and engineers to explore, prepare, and analyze data using languages like Python, Scala, and Spark SQL. However, these notebooks have limitations:

  • Size Constraints:
    • Notebook content size is limited to 32 MB.  
    • Notebook snapshot size is limited to 32 MB.  
    • Rich dataframe table output is limited to 10,000 rows and 5 MB data size.  
    • Maximum resource storage for both built-in and environment folders is 500 MB, with a single file size limit of 100 MB.  
  • Execution Limits:
    • A maximum of 256 code cells can be executed per notebook.  
    • The longest running job time is 7 days.  
    • The upper limit for concurrent notebooks in notebookutils.notebook.runMultiple() is 50.  
  • Other Limitations:
    • The statement depth for %run is limited to 5, with a total of 1,000 referenced cells.  
    • Notebook job history is retained for 60 days.  
    • Personalizing Spark sessions in %%configure requires specific configurations for driver and executor memory and cores.  

Data Factory Limitations

Fabric includes a Data Factory service for creating and managing data pipelines. However, this service has limitations compared to its Azure counterpart:

  • Limited Functionality: While most Azure Data Factory copy and orchestration patterns are applicable to Fabric pipelines, tumbling window is not yet available.  
  • Connector Constraints: Connectors do not support OAuth and Azure Key Vault (AKV). Managed System Identity (MSI) is only available for Azure Blob Storage.  
  • Parameterization: Connectors cannot use parameters.  
  • Activity Restrictions:
    • GetMetaData activity cannot have a source from Fabric KQL databases.  
    • Script activity cannot have a source from Fabric KQL databases.  
  • Connector Differences: Copy activity uses a Web connector, while Web/Webhook activities use a Web v2 connector with richer functionality.  
  • Unavailable Features: Validation activity, Mapping Data Flow activity, and the SSIS integration runtime are not available.  
  • Networking: Pipelines cannot use a managed virtual network.  
  • Authentication: Web activity does not support service principal-based authentication.  
  • Scheduling: Pipeline scheduling options are limited to by the minute, hourly, daily, and weekly.  
  • Dataflow Gen2: Dataflow Gen2 (CI/CD, preview) is not supported as an activity in pipelines.  
  • Authentication Sync: Background sync of authentication does not occur for pipelines. It is recommended to make minor updates to pipelines and save them to obtain and cache a new token.  
  • Reflex and Event Triggers: There are known issues with creating Reflexes and events that trigger pipelines.  
  • Connection Naming: Connections created in Data Factory get a default name that includes the user’s name, which can be a concern for organizations with strict naming conventions.  
  • Dataflow Gen2 Limitations:
    • Spaces or special characters are not supported in column or table names when writing to a Lakehouse. Duration and binary columns are not supported during authoring.  
    • A currently supported gateway must be installed to use with Dataflow Gen2.  
    • When using OAuth2 credentials, the gateway does not support refreshes longer than an hour.  
    • Delta Lake does not support case-sensitive column names, which can lead to “duplicate columns” errors.  
  • Resource Limits: Data Factory in Fabric has various resource limits, including the number of pipelines, concurrent runs, activities, parameters, and more. These limits are outlined in the table below:  
Pipeline ResourceDefault limitMaximum limit
Total number of pipelines within a workspace5,000…source vs ETL:** Fabric’s architecture favors an Extract, Load, Transform (ELT) approach over the traditional Extract, Transform, Load (ETL) approach. This means that data is first loaded into OneLake and then transformed within the lakehouse environment. This approach can be advantageous for large datasets and complex transformations, but it requires careful consideration of data processing and resource utilization.
Default Limit
  • Incremental Loads: Fabric supports Change Data Capture (CDC) for incremental loading, but there are challenges. The platform has limited built-in CDC connectors, requiring external orchestration for complex data synchronization scenarios. Additionally, merge performance can vary, with large merge operations being slower compared to solutions like Databricks Delta Live Tables.  

Real-time Analytics Limitations

Fabric offers capabilities for real-time analytics, but there are limitations to consider:

  • Limited Stream Processing: Fabric lacks built-in support for advanced stream processing frameworks like Apache Flink. This can limit the ability to perform complex real-time analytics on high-volume data streams.  
  • Scaling Challenges: Performance can degrade with high ingestion rates when dealing with large-scale event streams.  

Mirroring Limitations

Mirroring allows users to replicate data from external sources like Azure SQL Database and Azure Cosmos DB into Fabric. However, this feature has limitations:

  • Azure SQL Database Mirroring:
    • Only writable primary databases are supported.  
    • Databases with Change Data Capture (CDC), Azure Synapse Link for SQL, or those already mirrored in another workspace cannot be mirrored.  
    • A maximum of 500 tables can be mirrored.  
    • Tables without a defined primary key, or those using a non-clustered primary key, cannot be mirrored.  
    • Primary keys with data types sql_variant or timestamp/rowversion are not supported.  
    • Columns with data types image, text/ntext, xml, rowversion/timestamp, sql_variant, User Defined Types (UDT), geometry, and geography cannot be mirrored.  
    • Computed columns cannot be mirrored.  
  • Azure Cosmos DB Mirroring:
    • All limitations of the continuous backup feature in Azure Cosmos DB apply to Fabric mirroring.  
    • Only Azure Cosmos DB read-write account keys are supported for connection.  
    • Connection credentials must be updated if account keys are rotated.  
    • The source Azure Cosmos DB account must enable public network access for all networks.  
    • Private endpoints are not supported for Azure Cosmos DB accounts.  
    • Mirroring does not support containers with items that have property names containing whitespaces or wildcard characters.  
    • Warehouse cannot handle JSON string columns greater than 8 KB in size.  

Capacity Limitations

Fabric utilizes a capacity-based model for resource allocation. This model, while offering scalability, presents certain challenges:

  • Always-On Requirement: Capacities must always be running to access data from OneLake, even when accessed from non-Fabric environments via Azure Blob File System (ABFSS) URIs. This can lead to unnecessary costs when the capacity is not actively used.  
  • Data Browsing: While users cannot browse data in Lakehouse Explorer when capacities are paused, they can still access data through other means like ABFSS URIs.  
  • Throttling Risk: Sharing a single capacity across multiple workloads can lead to throttling if resource limits are exceeded. This can disrupt critical services and impact user experience. There are different levels of throttling:
    • Interactive Delay: Occurs when overconsumption continues for 10-60 minutes, causing delays in loading Power BI reports.  
    • Interactive Rejection: Occurs when overconsumption continues for 1-24 hours, preventing the display of content from Power BI reports.  
    • Background Rejection: Occurs when overconsumption continues for more than 24 hours, preventing datasets from refreshing.  
  • Capacity Management: Managing capacities effectively can be challenging, especially with the need to balance cost optimization and performance requirements. Identifying users who need a paused capacity can be difficult, and pausing capacities creates administrative burdens, as compute resources must be manually re-enabled.  
  • SKU Limitations: Different SKUs have limitations on features and capabilities. For example:
    • Lower SKUs (below F2) cannot create Workspace Identities directly from Fabric.  
    • Lower SKUs cannot create private connection links to resources behind a VPN in Azure.  
    • Deployment pipelines may have limitations in lower SKUs.  
    • Each SKU has specific limits and guardrails for Power BI workloads. For example, exceeding the maximum memory limit for an import model can cause reports to fail to load and prevent model refresh. Similarly, exceeding the Direct Lake rows per table limit can cause reports to fail or fall back to DirectQuery mode.  
  • Licensing Structure: The licensing and pay-as-you-go structure of Fabric can be complex and potentially lead to unexpected costs. Scaling is possible with pay-as-you-go, but it requires careful configuration to avoid overspending.  
  • Cost Comparison: Compared to Databricks, Fabric can be more expensive, especially when considering the need for separate capacities for development and production workloads.  

It is crucial for organizations to carefully plan their capacity strategy in Fabric, considering factors like workload characteristics, concurrency, and cost optimization. This includes monitoring capacity usage, optimizing workloads, and potentially using separate capacities for different workloads to avoid throttling and ensure optimal performance.

Security and Governance

While Fabric incorporates security features like role-based access control (RBAC), Microsoft Entra ID, and data masking, some concerns remain:

  • Data Protection and Privacy: Organizations need to implement robust security measures to protect sensitive data and ensure compliance with privacy regulations like GDPR and CASL. This includes understanding and adhering to data residency requirements, implementing appropriate access controls, and encrypting sensitive data.  
  • Access Restriction: Limiting data access to authorized users is crucial to prevent unauthorized access and data breaches. This can be achieved through RBAC, data masking, and other security measures.  
  • Data Integrity: Ensuring data authenticity and preventing tampering is essential for maintaining data integrity. This includes implementing data validation and quality checks, as well as monitoring for any unauthorized modifications.  
  • Lack of Granular Control: Fine-grained access controls are still under development, potentially limiting the ability to implement strict security policies.  
  • IP Whitelisting: Fabric currently does not allow IP whitelisting for the lakehouse, which can be a security concern for organizations with strict network security policies.  

Organizations need to be proactive in addressing security and privacy concerns in Fabric. This includes implementing best practices, staying informed about updates and new features, and potentially integrating with other security tools and services.

  • Data and Organizational Silos: One of the challenges in many organizations is the existence of data and organizational silos, where different departments or teams work in isolation and do not share information effectively. Fabric can help address this challenge by providing a unified platform for data storage, processing, and analysis, enabling better collaboration and data sharing across the organization.  
  • Insufficient Knowledge: To effectively utilize Fabric, employees need to have the right analytical skills. There might be a skills gap within organizations as the nature of work shifts from traditional data warehousing and BI to a more integrated and code-centric approach. Investing in employee training and development is crucial to bridge this gap and ensure successful adoption of Fabric.  
  • Governance and Compliance: Early implementation of data governance policies is essential in Fabric to avoid compliance issues and ensure data quality. This includes defining clear data ownership, establishing data lineage and audit trails, and implementing data quality checks and validation rules.  

Real-World Examples

Here are some examples of how companies are using Microsoft Fabric and the limitations they might encounter:

  • Welcome: A marine electronics company that uses Fabric to process data in real-time to ensure the safety of vessels and crew. They leverage Fabric’s real-time intelligence capabilities and Copilot to build a monitoring and management platform. They might encounter limitations related to real-time processing capabilities, especially when dealing with high-volume data streams.  
  • Zeiss Group: An optics and optoelectronics company that uses Fabric as their enterprise data platform. They appreciate the ability to combine data products without creating redundant copies and control access to sensitive information. They might encounter limitations related to data governance and fine-grained access control, especially as Fabric’s governance features are still evolving.  
  • Milliman: A consulting firm that helps insurance companies create complex scenarios for pricing insurance products. They use Fabric to store and process large volumes of data and appreciate the integration of various analytics tools within a single platform. They might encounter limitations related to capacity management and cost optimization, especially when dealing with complex simulations and large datasets.  

These examples illustrate how Fabric can be applied in various industries and the potential challenges organizations might face.

Workarounds and Solutions

While some limitations are inherent to the current state of Fabric, others can be mitigated through workarounds and best practices:

Data Warehousing

  • Optimize data storage: To address the lack of garbage collection for Parquet files, organizations can implement their own data lifecycle management processes to archive or delete unused files.
  • Use alternative tools: For functionalities not supported in the SQL analytics endpoint, consider using alternative tools like Spark for more complex data transformations and analysis.

SQL Database

  • Alternative solutions: For unsupported features like CDC and Azure Synapse Link for SQL, consider using alternative solutions or integrating with other Azure services.
  • Data type considerations: Be mindful of the data type limitations for primary keys and large binary objects.

Notebooks

  • Break down large notebooks: To overcome size constraints, break down large notebooks into smaller, more manageable ones.
  • Optimize code: Optimize code to reduce resource consumption and improve performance.
  • Parallel processing: Utilize multi-threading for parallel processing to improve efficiency.  

Data Factory

  • Alternative tools: For unsupported functionalities, consider using alternative tools or integrating with Azure Data Factory.
  • Custom solutions: Implement custom solutions or workarounds for connector limitations.
  • Gateway monitoring: Monitor gateway refresh limitations when using OAuth2 credentials and consider alternative authentication methods for long-running refreshes.

Capacity

  • Capacity monitoring: Monitor capacity usage and scale up or down as needed.  
  • Workload optimization: Optimize workloads to minimize resource consumption.  
  • Surge protection: Use surge protection to prevent overloads.  
  • Separate capacities: Consider separate capacities for different workloads to avoid throttling and ensure optimal performance.  
  • Cost optimization: Implement cost optimization strategies, such as capacity sizing, workload optimization, and efficient data storage.

Security and Governance

  • Principle of least privilege: Implement the principle of least privilege, granting users only the necessary permissions.  
  • Multi-factor authentication: Enable multi-factor authentication (MFA) for added security.  
  • Security audits: Conduct regular security audits and vulnerability assessments.  
  • Data backup and recovery: Implement robust data backup and recovery plans.  
  • Employee training: Educate and train employees on security best practices.  
  • Data governance: Implement data governance policies early on to ensure data quality and compliance.

Conclusion

Microsoft Fabric offers a powerful and integrated platform for data analytics, but it’s essential to be aware of its limitations. By understanding these constraints and implementing appropriate workarounds, organizations can effectively leverage Fabric’s capabilities while mitigating potential challenges. As Fabric continues to evolve, we can expect many of these limitations to be addressed, further solidifying its position as a leading analytics solution.

However, it’s important to remember that Fabric is still a relatively new platform, and some of its features are still under development. Organisations should carefully evaluate their needs and consider these limitations before fully committing to Fabric. The future of Fabric looks promising, with ongoing development and improvements, but for now, it’s crucial to be aware of its constraints and plan accordingly.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *