DDIM scheduler was always active, causing softer output. Now only
uses DDIM when ENABLE_IP_ADAPTER=True, otherwise uses model's
default scheduler for best quality.
When IP-Adapter FaceID is initialized, it modifies the pipeline's UNet
cross-attention layers. Calling raw pipeline() without face embeddings
leaves these layers in a broken state, causing corrupted output.
Solution: When IP-Adapter is loaded but no face_image provided, call
ip_model.generate() with s_scale=0.0 and zero embeddings to properly
disable face conditioning while satisfying the modified layers.