Tuesday, 17 November 2020

How to prevent users from connecting SharePoint Sites to Office 365 Groups?

Office 365 Group is a security (membership) group that is tied to various apps that we are subscribed to as part of our 365 subscription. It provide us distribution list, outlook calendar, site collection. The files tab in office 365 group points to document library on sharepoint site. We can associate team with the Office 365 group.

When we connect our Sharepoint site to a new Office 365 group, a number of things happen:

A new Office 365 group is created, and that group is connected to our site collection
A new modern home page is created on our site and set as the site's home page
The group's Owners are now the site collection administrators
The group's Owners are added to our site's Owners group
The group's Members are added to your site's Members group

After our site is connected to an Office 365 group, it behaves like a modern group-connected team site, so granting people permission to the connected Office 365 group now also grants them access to the SharePoint site, a Microsoft Team can be created on top of the site, Planner can be integrated, and so on.

Remember how in the very first step we clicked on Gear Icon > Connect to new Office 365 Group? What if we do not want our SharePoint Site Admins to do this on their own (for governance reasons). We can disable the ability to connect to Office 365 Groups from a SharePoint Admin Center. Here is how to do this:

1. App Launcher > Admin

2. Navigate to SharePoint Admin Center

3. Click on Settings

4. Scroll down to Connections from sites to Office 365 groups section and choose Prevent site collection administrators from connecting sites to new Office 365 groups option.

5. Click OK at the bottom. Give it a few minutes for a change to propagate through.

Monday, 2 November 2020

Diagnosing Disk Performance Issues

Disk performance issues can be hard to track down but can also cause a wide variety of issues. The disk performance counter available in Windows are numerous, and being able to select the right counters for a given situation is a great troubleshooting skill. Here, we'll review two basic scenarios - measuring overall disk performance and determining if the disks are a bottleneck.

Measuring Disk Performance

When it comes to disk performance, there are two important considerations: IOPS and byte throughput. IOPS is the raw number of disk operations that are performed per second. Byte throughput is the effective bandwidth the disk is achieving, usually expressed in MB/s. These numbers are closely related - a disk with more IOPS can provide better throughput.
These can be measured in perfmon with the following counters:

Disk Transfers/sec
- Total number of IOPS. This should be about equal to Disk Reads/sec + Disk Writes/sec
Disk Reads/sec
- Disk read operations per second (IOPS which are read operations)
Disk Writes/sec
- Disk write operations per second (IOPS which are write operations)
Disk Bytes/sec
- Total disk throughput per second. This should be about equal to Disk Read Bytes/sec + Disk Write Bytes/sec
Disk Read Bytes/sec
- Disk read throughput per second
Disk Write Bytes/sec
- Disk write throughput per second

These performance counters are available in both the LogicalDisk and PhysicalDisk categories. In a standard setup, with a 1:1 disk-partition mapping, these would provide the same results. However, if you have a more advanced setup with storage pools, spanned disks, or multiple partitions on a single disk, you would need to choose the correct category for the part of the stack you are measuring.

Here are the results on a test VM. In this test, diskspd was used to simulate an average mixed read/write workload. The results show the following:

3,610 IOPS
- 2,872 read IOPS
- 737 write IOPS
17.1 MB/s total throughput
- 11.2 MB/s read throughput
- 5.9 MB/s write throughput

In this case, we're seeing a decent number of IOPS with fairly low throughput. The expected results vary greatly depending on the underlying storage and the type of workload that is running. In any case, you can use these counters to get an idea of how a disk is performing during real world usage.

Disk Bottlenecks

Determining if storage is a performance bottleneck relies on a different set of counters than the above. Instead of looking at IOPS and throughput, latency and queue lengths needs to be checked. Latency is the amount of time it takes to get a piece of requested data back from the disk and is measured in milliseconds (ms). Queue length refers to the number of outstanding IO requests that are in the queue to be sent to the disk. This is measured as an absolute number of requests.
The specific perfmon counters are:

Avg. Disk sec/Transfer
- The average number of seconds it takes to get a response from the disk. This is the total latency.
Avg. Disk sec/Read
- The average number of seconds it takes to get a response from the disk for read operations. This is read latency.
Avg. Disk sec/Write
- The average number of seconds it takes to get a response from the disk for read operations. This is write latency.
Current Disk Queue Length
- The current number of IO requests in the queue waiting to be sent to the storage system.
Avg. Disk Read Queue Length
- The average number of read IO requests in the queue waiting to be sent to the storage system. The average is taken over the perfmon sample interval (default of 1 second)
Avg. Disk Write Queue Length
- The average number of read IO requests in the queue waiting to be sent to the storage system. The average is taken over the perfmon sample interval (default of 1 second)

Here are the results on a test VM. In this test, diskspd was used to simulate an IO-intensive read/write workload. Here is what the test shows:

Total disk latency: 42 ms (0.042 seconds is equal to 42 milliseconds)
- Read latency: 5 ms
- Write latency: 80 ms
Total disk queue: 48
- Read queue: 2.7
- Write queue: 45

These results show that the disk is clearly a bottleneck and underperforming for the workload. Both the write latency and write queue are very high. If this were a real environment, we would be digging deeper into the storage to see where the issue is. It could be that there's a problem on the storage side (like a bad drive or a misconfiguration), or that the storage is simply too slow to handle the workload.
Generally speaking, the performance tests can be interpreted with the following:

Disk latency should be below 15 ms. Disk latency above 25 ms can cause noticeable performance issues. Latency above 50 ms is indicative of extremely underperforming storage.
Disk queues should be no greater twice than the number of physical disks serving the drive. For example, if the underlying storage is a 6 disk RAID 5 array, the total disk queue should be 12 or less. For storage that isn't mapped directly to an array (such as in a private cloud or in Azure), queues should be below 10 or so. Queue length isn't directly indicative of performance issues but can help lead to that conclusion.

These are general rules and may not apply in every scenario. However, if you see the counters exceeding the thresholds above, it warrants a deeper investigation.

General Troubleshooting Process

If a disk performance issue is suspected to be causing a larger problem, we generally start off by running the second set of counters above. This will determine if the storage is actually a bottleneck, or if the problem is being caused by something else. If the counters indicate that the disk is underperforming, we would then run the first set of counters to see how many IOPS and how much throughput we are getting. From there, we would determine if the storage is under-spec'ed or if there is a problem on the storage side. In an on-premise environment, that would be done by working with the storage team. In Azure, we would review the disk configuration to see if we're getting the advertised performance.

Using and Understanding DCDiag

DCDiag is the comprehensive, built-in utility for checking the health of an Active Directory Domain Controller. This command runs 20+ checks against the selected DC including DNS health, replication health, general errors, and more. Here, we'll look at how to use the command effectively and how to read its output.

Command Parameters

The DCDiag command has a number of available options and ways it can be run. However, the most common way to run it is with no parameters at all:

Running it like this will cause DCDiag to run the default set of tests against the local server. The default set of tests is fairly extensive, so if you only want to run a specific test, you can use the /test parameter to specify the test to run:

Conversely, you can use the /skip parameter to skip specific tests:

Finally, the other most common option is to use the /s parameter to specify another server to run the tests against. This allows you to run dcdiag remotely against other DCs in the environment. For example, this would run the tests against a server named CONTOSODC2:

Reading the Output

When run without any parameters, the DCDiag command outputs a lot of information that can be hard to decipher. Let's look at a basic example:

In this screenshot, the information in the blue box is the basic setup and core connectivity test. This section will be present whenever you run DCDiag. After that, the tool moves on to the actual DC tests. The sections in green indicate tests that passed successfully, while the red section shows a test that failed.

To help make the results easier to read, you can use a combination of parameters that simplify the output. The /q parameter will hide all tests that passed, leaving only the tests that failed. Additionally, the SystemLog test can sometimes output hundreds of lines of information that may or may not be relevant. Putting it together, using dcdiag /q /skip:systemlog will provide a very easy to read summary of the DC health:

You can see how all of the extra information, including successful tests, is hidden from the output. This makes it easy to focus on the failures.

Common Failures

Now that we can run and understand the output of DCDiag, let's look at some common failures and errors and how to resolve them.

In the NCSecDesc test, the error message "Error NT AUTHORITY\ENTERPRISE DOMAIN CONTROLLERS doesn't have Replicating Directory Changes In Filtered Set" indicates that the domain hasn't been prepped for read-only domain controllers. If no RODCs are present or are being planned, this error can be safely ignored.
The SystemLog test will fail if there are any recent errors present in the server's system event log, even if the errors aren't AD related. Due to this, any errors in the event log can cause DCDiag to report a failure for this test. You'll need to review the events reported to understand their severity and relevance.
The FrsEvent and DFSREvent tests report on errors logged to the File Replication Service and DFS Replication event logs. If these tests fail, check those event logs (within the "Application and Services Logs" section) in event viewer.
The Replications test checks for recent successful replication in the environment. If this test fails, some detailed information will be provided. For additional information, use the repadmin command and the directory service event log to diagnose further.
If you get multiple Access Denied errors, ensure you are running CMD as administrator. UAC can prevent DCDiag from having permission for all tests unless it's run with administrator permissions.
If connectivity or RPC errors are returned and you're targeting another server, ensure that the remote server is accessible from the current server. There could be networking, firewall, or other issues preventing DCDiag from communicating with the remote server.

Technology and Tech Blogs