Posts

How do you build a data warehouse in SQL server?

Building a data warehouse in SQL server is a complex process that requires a deep understanding of SQL Server and data warehousing best practices. It involves several steps, including planning, designing, and implementing the database schema, as well as loading and transforming the data. Here are the high-level steps you can follow to build a data warehouse in SQL Server: ·          Identify the business requirements : Before you start building a data warehouse, you need to identify the business requirements, such as what data is needed, who will use the data, and how the data will be analysed. ·         Plan the data warehouse schema: Once you have identified the business requirements, you need to plan the data warehouse schema. This involves designing the tables, relationships, and indexes that will be used to store the data. ·         Create the database: After plann...

When to choose data warehouse over database?

  The decision between a database and a data warehouse will depend on your specific needs and use case. Deciding between a database and a data warehouse largely depends on the specific needs of your organization and the type of data you're working with. Some of the factors for consideration are: Ø  Data Volume : When you have large volume of data (e.g., terabytes or more), then better to go for a data warehouse as data warehouses are optimized for handling large amounts of data. Databases are designed to handle smaller amounts of data. Ø  Data Complexity : If you are working with complex data that requires advanced querying and analysis, then, a data warehouse will be a better choice. Data warehouses can handle complex queries and data modelling. Databases may not be equipped to handle the same level of complexity. Ø  Data Integration : If you need to integrate data from multiple sources, a data warehouse may be the better choice. Data warehouses are designed to hand...

Evaluation of live analytical system

Evaluating the analytical results on live systems involves testing the validity and accuracy of the results obtained from the analysis.  This process can involve the following steps:   Verification of Data: The first step is to ensure that the data used in the analysis is accurate and up-to-date. This includes checking for any errors in the data collection process and making sure that the data is relevant to the system being evaluated. Comparison with Actual Results : The next step is to compare the analytical results with actual results obtained from the live system. This will help to determine if the analysis is producing accurate results and if any discrepancies exist, the cause of these discrepancies can be investigated. Sensitivity Analysis: It is also important to perform a sensitivity analysis to determine how changes in the inputs used in the analysis will affect the results. This helps to identify any areas where the analysis may be vulnerable to changes in the live ...

How do you test analytical application?

Importance of testing Analytical Applications: Accuracy: Testing ensures that the application accurately provides the results that it is intended to produce. This is important because these applications are used to make critical business decisions. Performance : Testing can identify any performance issues, such as slow processing times or data retrieval, which can impact the overall efficiency of the application. Scalability : As the volume of data processed by analytical applications increases, it's important to ensure that they can scale to handle the increased load. Testing can identify any scalability issues early on. Reliability : Testing can ensure that the application is reliable and produces consistent results over time. This is important for maintaining trust in the results produced by the application. Usability : Testing can identify any usability issues, such as poor navigation or unclear instructions, which can negatively impact the user experience and limit the adopti...

How do you assess the data quality using python?

  There is an amazing open-source python library ‘ydata_quality . It assesses Data Quality throughout the multiple stages of a data pipeline development. Once you have a dataset available, running  DataQuality(df=my_df).evaluate()  provides a comprehensive overview of the details and intricacies of the data, through the perspective of the multiple modules available in the package. The library focuses on the following individual modules as well. 1.      Bias and Fairness Checks the bias and fairness in the dataset. Ø   Bias : A systematic, non-neglectable treatment which is differentiated towards a specific sub-group of individuals Ø   Fairness : The absence of differentiated treatment (assistive or punitive) based on sensitive attributes.  Fairness  can also be thought of as the absence of unjustified basis for differentiated treatment. 2.      Data Expectations To define an expectation about dat...

Data Quality Audit

There are several steps you can take to audit the quality of your data. Here is a general process you can follow: ·         Define what you mean by "data quality." This will vary depending on the specific use case for the data, but generally, it should include things like completeness, accuracy, consistency, and integrity. ·         Identify the sources of your data. This could include databases, spreadsheets, external APIs, or other systems. It's important to understand where your data is coming from in order to assess its quality. ·         Establish a set of data quality checks. These checks should be based on the definitions you established in step 1 and should be applied to each source of data. Examples of checks include verifying that required fields are not empty, that dates are in the correct format, and that data falls within a certain range. ·      ...

What is Data Quality and why do we need one?

Image
What is Data quality? Data quality refers to the overall level of excellence and trustworthiness of data. It encompasses a variety of characteristics that can affect the usability and integrity of data, including accuracy, completeness, consistency, timeliness, and accessibility. Ensuring data quality is essential for making informed decisions and for maintaining the credibility of an organization. Source: Google images Importance of Data Quality: Data quality is important because it affects the accuracy and usefulness of the information that is derived from it. Poor quality data can lead to incorrect conclusions, bad decision-making, and wasted resources. For example, a business might make a poor decision because the data it was using was inaccurate or incomplete. Or, a research study might produce inaccurate results because the data it was based on was of poor quality . Moreover, it also plays an important role in the efficiency and effectiveness of any organization. Data is the fuel...

Popular ML models comparison

Image

Why do we need NoSQL over SQL and how do we choose the best database for the project?

Image
In this article we will learn about databases RDBMS and NoSQL. What is a Database? Database is Structured, provides Random access that will help us to search data quickly based on index, having low latency and provides ACID properties. source: Google images ACID properties: A tomicity : The data will be transfer either full or none will be transferred.  Example: When you transfer money online using netbanking , either full amount should be transferred or none should be transferred. C onsistency : Data should be consistent. I solation : If multiple people are working in same bank / doing transfers at the same time, the there should be a sequencing/locking mechanism D urability : Whenever there is a database failure, the system should be up and running to save the data. Why do the companies want NoSQL over SQL? NoSQL databases offer many benefits over relational databases.  NoSQL databases have flexible data models, scale horizontally, have incredibly fast queries, and are ...

SQL and NoSQL Comparison

Image