What is the GPU Bottleneck?
Often, the performance cost scales with the amount of pixels. To test that, you can vary the render resolution using r.SetRes or scale the viewport in the editor.
Using r.ScreenPercentage is even more convenient but keep in mind there is some extra upsampling cost added once the feature gets used.
If you see a measurable performance change, you are bound by something pixel related. Usually, it is either memory bandwidth (reading and writing) or math bound (ALU), but in rare cases, some specific
units are saturated (e.g. MRT export). If you can lower memory (or math) on the relevant passes and see a performance difference, you know it was bound by the memory bandwidth (or the ALU units).
The change does not have to look the same - this is just a test. Now you know you have to reduce the cost there to improve the performance.
The shadow map resolution does not scale with the screen resolution (use r.Shadow.MaxResolution), but unless you have very large areas of shadow casting masked or translucent materials,
you are not pixel shader bound. Often the shadow map rendering is bound by vertex processing or triangle processing (causes: dense meshes, no LOD, tessellation, WorldPositionOffset use).
Shadow map rendering cost scales with the number of lights, number of cascades/cubemaps sides and the number of shadow casting objects in the light frustum. This is a very common bottleneck
and only larger content changes can reduce the cost.
Highly tessellated meshes, where the wireframe appears as a solid color, can suffer from poor quad utilization. This is
because GPUs process triangles in 2x2 pixel blocks and reject pixels outside of the triangle a bit later. This is needed for mip map computations. For larger triangles, this is not a problem,
but if triangles are small or very lengthy the performance can suffer as many pixels are processed but few actually contribute to the image. Deferred shading improves the situation, since
we get very good quad utilization on the lighting. The problem will remain during the base pass, so complex materials might render quite a bit slower.
The solution to this is using less dense meshes. With level of detail meshes this can be done only where it is a problem (in the distance).
You can adjust r.EarlyZPass to see if your scene would benefit from a full early Z pass (more draw calls, less overdraw during base pass).
If changing the resolution does not matter much, you are likely bound by vertex processing (vertex shader, or tessellation) cost.
Often, you will have to change content to verify that. Typical causes include:
Too many vertices. (Use Level of Detail meshes)
Complex World Position Offset / Displacement Material using Textures with poor MIP mapping. (adjust the Material)
Tessellation (Avoid if possible, adjust tessellation factor - fastest way: show Tessellation, some hardware scales badly with larger tessellation levels)
Many UV or Normal seams result in more vertices. (look at unwrapped UV - many islands are bad, flat shaded meshes have 3 vertices per triangle)
Too many vertex attributes. (extra UV channels)
Verify the vertex count is reasonable, some importer code might not have welded the vertices. (combine vertices that have same position, UV and normal)
Less often, you are bound by something else. That could be:
Object cost (more likely a CPU e.g. cost but there might be some GPU cost)
Triangle setup cost (very high poly meshes with a cheap vertex shader e.g. shadow map static meshes, rarely the issue)
Use level of detail (LOD) meshes
View cost (e.g. HZB occlusion culling)
Scene cost (e.g. GPU particle simulation)