In a hierarchically-structured cloud/edge/device computing environment, workload allocation can greatly affect the overall system performance. This paper deals with AIoriented medical workload generated in emergency rooms (ER) or intensive care units (ICU) in metropolitan areas. The goal is to optimize AI-workload allocation to cloud clusters, edge servers, and end devices so that minimum response time can be achieved in life-saving emergency applications.In particular, we developed a new workload allocation method for the AI workload in distributed cloud/edge/device computing systems. An efficient scheduling and allocation strategy is developed in order to reduce the overall response time to satisfy multi-patient demands. We apply several ICU AI workloads from a comprehensive edge computing benchmark Edge AIBench. The healthcare AI applications involved are short-of-breath alerts, patient phenotype classification, and life-death threats. Our experimental results demonstrate the high efficiency and effectiveness in real-life health-care and emergency applications.