Editing Safety engineering

{{Short description|Engineering discipline which assures that engineered systems provide acceptable levels of safety}}
{{More footnotes|date=January 2011}}
[[File:ISS_impact_risk.jpg|thumb|300px|right|NASA's illustration showing high impact risk areas for the International Space Station]]
'''Safety engineering''' is an [[engineering]] [[Branches of science|discipline]] which assures that engineered [[System|systems]] provide acceptable levels of [[safety]]. It is strongly related to [[industrial engineering]]/[[systems engineering]],  and the subset [[system safety]] engineering. Safety engineering assures that a [[life-critical system]] behaves as needed, even when components [[Failure|fail]].

==Analysis techniques==
Analysis techniques can be split into two categories: [[Qualitative research|qualitative]] and [[Quantitative research|quantitative]] methods. Both approaches share the goal of finding causal dependencies between a [[hazard]] on system level and failures of individual components. Qualitative approaches focus on the question "What must go wrong, such that a system hazard may occur?", while quantitative methods aim at providing estimations about probabilities, rates and/or severity of consequences.

The complexity of the technical systems such as Improvements of Design and Materials, Planned Inspections, Fool-proof design, and Backup Redundancy decreases risk and increases the cost. The risk can be decreased to ALARA (as low as reasonably achievable) or ALAPA (as low as practically achievable) levels.

Traditionally, safety analysis techniques rely solely on skill and expertise of the safety engineer. In the last decade [[Model-based systems engineering|model-based]] approaches, like STPA (Systems Theoretic Process Analysis), have become prominent. In contrast to traditional methods, model-based techniques try to derive relationships between causes and consequences from some sort of model of the system.

===Traditional methods for safety analysis===
The two most common fault modeling techniques are called [[failure mode and effects analysis]] (FMEA) and [[fault tree analysis]] (FTA). These techniques are just ways of finding problems and of making plans to cope with failures, as in [[probabilistic risk assessment]]. One of the earliest complete studies using this technique on a commercial nuclear plant was the [[WASH-1400]] study, also known as the Reactor Safety Study or the Rasmussen Report.

====Failure modes and effects analysis====
{{Main|Failure mode and effects analysis}}
Failure Mode and Effects Analysis (FMEA) is a bottom-up, [[inductive reasoning|inductive]] analytical method which may be performed at either the functional or piece-part level. For functional FMEA, failure modes are identified for each function in a system or equipment item, usually with the help of a functional [[block diagram]]. For piece-part FMEA, failure modes are identified for each piece-part component (such as a valve, connector, resistor, or diode). The effects of the failure mode are described, and assigned a probability based on the [[failure rate]] and failure mode ratio of the function or component. This quantization is difficult for software ---a bug exists or not, and the failure models used for hardware components do not apply.  Temperature and age and manufacturing variability affect a resistor; they do not affect software.

Failure modes with identical effects can be combined and summarized in a Failure Mode Effects Summary. When combined with criticality analysis, FMEA is known as [[Failure Mode, Effects, and Criticality Analysis]] or FMECA.

====Fault tree analysis====
{{Main|Fault tree analysis}}
Fault tree analysis (FTA) is a top-down, [[deductive reasoning|deductive]] analytical method.  In FTA, initiating primary events such as component failures, human errors, and external events are traced through [[Boolean logic]] gates to an undesired top event such as an aircraft crash or nuclear reactor core melt. The intent is to identify ways to make top events less probable, and verify that safety goals have been achieved.

[[File:Fault tree.svg|thumb|A fault tree diagram]]

Fault trees are a logical inverse of success trees, and may be obtained by applying [[de Morgan's laws|de Morgan's theorem]] to success trees (which are directly related to [[reliability block diagram]]s).

FTA may be qualitative or quantitative. When failure and event probabilities are unknown, qualitative fault trees may be analyzed for minimal cut sets. For example, if any minimal cut set contains a single base event, then the top event may be caused by a single failure. Quantitative FTA is used to compute top event probability, and usually requires computer software such as CAFTA from the [[Electric Power Research Institute]] or [[SAPHIRE]] from the [[Idaho National Laboratory]].

Some industries use both fault trees and [[event tree]]s. An event tree starts from an undesired initiator (loss of critical supply, component failure etc.) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of "top events" arising from the initial event can then be seen.

=== Oil and gas industry offshore (API 14C; ISO 10418) ===
The offshore oil and gas industry uses a qualitative safety systems analysis technique to ensure the protection of offshore production systems and platforms. The analysis is used during the design phase to identify process engineering hazards together with risk mitigation measures. The methodology is described in the [[American Petroleum Institute]] Recommended Practice 14C ''Analysis, Design, Installation, and Testing of Basic Surface Safety Systems for Offshore Production Platforms.''

The technique uses system analysis methods to determine the safety requirements to protect any individual process component, e.g. a vessel, [[Pipeline transport|pipeline]], or [[pump]].<ref name=":0">API RP 14C p.1</ref> The safety requirements of individual components are integrated into a complete platform safety system, including liquid containment and emergency support systems such as fire and gas detection.<ref name=":0" />

The first stage of the analysis identifies individual process components, these can include: flowlines, headers, [[pressure vessel]]s, atmospheric vessels, [[Industrial furnace|fired heaters]], exhaust heated components, pumps, [[compressor]]s, pipelines and [[heat exchanger]]s.<ref name=":1">API RP 14C p.vi</ref> Each component is subject to a safety analysis to identify undesirable events (equipment failure, process upsets, etc.) for which protection must be provided.<ref name=":2">API RP 14C p.15-16</ref> The analysis also identifies a detectable condition (e.g. [[high pressure]]) which is used to initiate actions to prevent or minimize the effect of undesirable events. A Safety Analysis Table (SAT) for pressure vessels includes the following details.<ref name=":2" /><ref name=":3">API RP 14C p.28</ref>
{| class="wikitable"
! colspan="3" |Safety Analysis Table (SAT) pressure vessels
|-
!Undesirable event
!Cause
!Detectable abnormal condition
|-
|Overpressure
|Blocked or restricted outlet

Inflow exceeds outflow

Gas blowby (from upstream)

Pressure control failure

Thermal expansion

Excess heat input
|High pressure
|-
|Liquid overflow
|Inflow exceeds outflow

Liquid slug flow

Blocked or restricted liquid outlet

Level control failure
|High liquid level
|}
Other undesirable events for a pressure vessel are under-pressure, gas blowby, leak, and excess temperature together with their associated causes and detectable conditions.<ref name=":3" />
[[File:Vessel_level_instrumentation.jpg|thumb|Vessel level instrumentation]]
Once the events, causes and detectable conditions have been identified the next stage of the methodology uses a Safety Analysis Checklist (SAC) for each component.<ref>API RP 14C p.57</ref> This lists the safety devices that may be required or factors that negate the need for such a device. For example, for the case of liquid overflow from a vessel (as above) the SAC identifies:<ref>API RP 14C p.29</ref>

* A4.2d - High level sensor (LSH)<ref name="ISO-14617-1:2005">{{cite web |title=ISO 14617-1:2005 Graphical symbols for diagrams — Part 1: General information and indexes |url=https://www.iso.org/standard/41838.html |publisher=[[International Organization for Standardization]]}}</ref>
** 1.  LSH installed.
** 2.  Equipment downstream of gas outlet is not a flare or vent system and can safely handle maximum liquid carry-over.
** 3.  Vessel function does not require handling of separate fluid phases.
** 4.  Vessel is a small trap from which liquids are manually drained.
[[File:Vessel_pressure_instrumentation.jpg|thumb|Vessel pressure instrumentation]]
The analysis ensures that two levels of protection are provided to mitigate each undesirable event. For example, for a pressure vessel subjected to over-pressure the primary protection would be a PSH (pressure switch high) to shut off inflow to the vessel, secondary protection would be provided by a [[Safety valve|pressure safety valve]] (PSV) on the vessel.<ref>API RP 14C p.10</ref>

The next stage of the analysis relates all the sensing devices, shutdown valves (ESVs), trip systems and emergency support systems in the form of a Safety Analysis Function Evaluation (SAFE) chart.<ref name=":1" /><ref>API RP 14C p.80</ref>
{| class="wikitable"
! colspan="4" rowspan="2" |Safety Analysis Function Evaluation  (SAFE) chart
|Close inlet valve
|Close outlet valve
|Alarm
|-
|ESV-1a
|ESV-1b
|
|-
|Identification
|Service
|Device
|SAC reference
|
|
|
|-
| rowspan="5" |V-1
| rowspan="5" |HP separator
|PSH
|A4.2a1
|X
|
|X
|-
|LSH
|A4.2d1
|X
|
|X
|-
|LSL
|A4.2e1
|
|X
|X
|-
|PSV
|A4.2c1
|
|
|
|-
|etc.
|
|
|
|
|-
|V-2
|LP separator
|etc.
|
|
|
|
|}
X denotes that the detection device on the left (e.g. PSH) initiates the shutdown or warning action on the top right (e.g. ESV closure).

The SAFE chart constitutes the basis of Cause and Effect Charts which relate the sensing devices to [[Shut down valve|shutdown valves]] and plant trips which defines the functional architecture of the [[Plant process and emergency shutdown systems#Process shutdown (PSD)|process shutdown]] system.

The methodology also specifies the systems testing that is necessary to ensure the functionality of the protection systems.<ref>API RP 14C Appendix D</ref>

API RP 14C was first published in June 1974.<ref>{{Cite book|chapter-url=https://www.onepetro.org/conference-paper/SPE-7147-MS|chapter=Impact of API 14C on the Design And Construction of Offshore Facilities|doi=10.2118/7147-MS |access-date=7 February 2019|title=All Days |year=1978 |last1=Farrell |first1=Tim }}</ref> The 8th edition was published in February 2017.<ref>{{Cite web|url=https://global.ihs.com/doc_detail.cfm?document_name=API%20RP%2014C&item_s_key=00010460|title=API RP 14C|access-date=7 February 2019}}</ref> API RP 14C was adapted as ISO standard ISO 10418 in 1993 entitled ''Petroleum and natural gas industries — Offshore production installations — Analysis, design, installation and testing of basic surface process safety systems.''<ref>{{Cite web|url=https://www.iso.org/standard/38067.html|title=ISO 10418|access-date=7 February 2019}}</ref> The latest edition of ISO 10418 was published in 2019. <ref>{{Cite web|url=https://www.iso.org/standard/55440.html|title=ISO 10418|access-date=2 January 2025}}</ref>

==Safety certification==
Typically, safety guidelines prescribe a set of steps, deliverable documents, and exit criterion focused around planning, analysis and design, implementation, verification and validation, configuration management, and quality assurance activities for the development of a safety-critical system.<ref>{{Cite book|last1=Rempel|first1=Patrick|last2=Mäder|first2=Patrick|last3=Kuschke|first3=Tobias|last4=Cleland-Huang|first4=Jane|title=Proceedings of the 36th International Conference on Software Engineering |chapter=Mind the gap: Assessing the conformance of software traceability to relevant guidelines |author4-link=Jane Cleland-Huang|date=2014-01-01|series=ICSE 2014|location=New York, NY, USA|publisher=ACM|pages=943–954|doi=10.1145/2568225.2568290|isbn=9781450327565|citeseerx=10.1.1.660.2292|s2cid=12976464}}</ref> In addition, they typically formulate expectations regarding the creation and use of [[Requirements traceability|traceability]] in the project. For example, depending upon the criticality level of a requirement, the [[Federal Aviation Administration|US Federal Aviation Administration]] guideline [[DO-178C|DO-178B/C]] requires [[Requirements traceability|traceability]] from [[requirement]]s to [[design]], and from [[requirement]]s to [[source code]] and executable [[object code]] for software components of a system. Thereby, higher quality traceability information can simplify the certification process and help to establish trust in the maturity of the applied development process.<ref>{{Cite journal|last1=Mäder|first1=P.|last2=Jones|first2=P. L.|last3=Zhang|first3=Y.|last4=Cleland-Huang|first4=J.|author4-link=Jane Cleland-Huang|date=2013-05-01|title=Strategic Traceability for Safety-Critical Projects|journal=IEEE Software|volume=30|issue=3|pages=58–66|doi=10.1109/MS.2013.60|s2cid=16905456|issn=0740-7459}}</ref>

Usually a failure in safety-[[product certification|certified]] systems is acceptable{{by whom|date=April 2015}} if, on average, less than one life per 10<sup>9</sup> hours of continuous operation is lost to failure.{as per FAA document AC 25.1309-1A} Most Western [[nuclear reactors]], medical equipment, and commercial [[aircraft]] are certified{{by whom|date=April 2015}} to this level.{{citation needed|date=April 2015}} The cost versus loss of lives has been considered appropriate at this level (by [[FAA]] for aircraft systems under [[Federal Aviation Regulations]]).<ref>{{cite book|url=http://www.faa.gov/documentLibrary/media/Advisory_Circular/AC%2025.1309-1A.pdf|title=System Design and Analysis|publisher=[[Federal Aviation Administration]]|year=1988|id=Advisory Circular AC&nbsp;25.1309-1A|author=ANM-110|access-date=2011-02-20}}</ref><ref>{{cite book|url=http://standards.sae.org/arp4754a|title=Guidelines for Development of Civil Aircraft and Systems|last=S–18|publisher=[[Society of Automotive Engineers]]|year=2010|id=ARP4754A}}
</ref><ref>{{cite book|url=http://www.sae.org/technical/standards/ARP4761|title=Guidelines and methods for conducting the safety assessment process on civil airborne systems and equipment|last=S–18|publisher=[[Society of Automotive Engineers]]|year=1996|id=ARP4761}}
</ref>

==Preventing failure==
[[File:Survival redundancy.svg|thumbnail|A [[NASA]] graph shows the relationship between the survival of a crew of astronauts and the amount of [[redundancy (engineering)|redundant]] equipment in their spacecraft (the "MM", Mission Module).]]

Once a failure mode is identified, it can usually be mitigated by adding extra or redundant equipment to the system. For example, nuclear reactors contain dangerous [[radiation]], and nuclear reactions can cause so much [[heat]] that no substance might contain them. Therefore, reactors have emergency core cooling systems to keep the temperature down, shielding to contain the radiation, and engineered barriers (usually several, nested, surmounted by a [[containment building]]) to prevent accidental leakage. [[Safety-critical system]]s are commonly required to permit no [[single point of failure|single event or component failure]] to result in a catastrophic failure mode.

Most [[biology|biological]] organisms have a certain amount of redundancy: multiple organs, multiple limbs, etc.

For any given failure, a fail-over or redundancy can almost always be designed and incorporated into a system.

There are two categories of techniques to reduce the probability of failure:
Fault avoidance techniques increase the reliability of individual items (increased design margin, de-rating, etc.).
Fault tolerance techniques increase the reliability of the system as a whole (redundancies, barriers, etc.).<ref>
Tommaso Sgobba.
[http://www.spacesafetymagazine.com/spaceflight/commercial-spaceflight/commercial-space-safety-standards-lets-not-re-invent-wheel/ "Commercial Space Safety Standards: Let’s Not Re-Invent the Wheel"].
2015.
</ref>

== Safety and reliability ==
{{Further|Inherent safety}}{{Further|Reliability engineering}}

Safety engineering and reliability engineering have much in common, but safety is not reliability.  If a medical device fails, it should fail safely; other alternatives will be available to the surgeon.  If the engine on a single-engine aircraft fails, there is no backup.  Electrical power grids are designed for both safety and reliability; telephone systems are designed for reliability, which becomes a safety issue when emergency (e.g. US [[911 (emergency telephone number)|911]]) calls are placed.

[[Probabilistic risk assessment]] has created a close relationship between safety and reliability. Component reliability, generally defined in terms of component [[failure rate]], and external event probability are both used in quantitative safety assessment methods such as FTA. Related probabilistic methods are used to determine system [[Mean time between failures|Mean Time Between Failure (MTBF)]], system availability, or probability of mission success or failure. Reliability analysis has a broader scope than safety analysis, in that non-critical failures are considered. On the other hand, higher failure rates are considered acceptable for non-critical systems.

Safety generally cannot be achieved through component reliability alone. Catastrophic failure probabilities of 10<sup>−9</sup> per hour correspond to the failure rates of very simple components such as [[resistor]]s or [[capacitor]]s. A complex system containing hundreds or thousands of components might be able to achieve a MTBF of 10,000 to 100,000 hours, meaning it would fail at 10<sup>−4</sup> or 10<sup>−5</sup> per hour. If a system failure is catastrophic, usually the only practical way to achieve 10<sup>−9</sup> per hour failure rate is through redundancy.

When adding equipment is impractical (usually because of expense), then the least expensive form of design is often "inherently fail-safe". That is, change the system design so its failure modes are not catastrophic. Inherent fail-safes are common in medical equipment, traffic and railway signals, communications equipment, and safety equipment.

The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way (for nuclear power plants, this is termed a [[Passive nuclear safety|passively safe]] design, although more than ordinary failures are covered). Alternately, if the system contains a hazard source such as a battery or rotor, then it may be possible to remove the hazard from the system so that its failure modes cannot be catastrophic. The U.S. Department of Defense Standard Practice for System Safety (MIL–STD–882) places the highest priority on elimination of hazards through design selection.<ref>{{cite book
 |title        = Standard Practice for System Safety
 |version      = E
 |publisher    = [[United States Department of Defense|U.S. Department of Defense]]
 |year         = 1998
 |url          = https://acc.dau.mil/adl/en-US/683694/file/75173/MIL-STD-882E%20Final%202012-05-11.pdf
 |id           = MIL-STD-882
 |access-date  = 2012-05-11
 |archive-date = 2017-01-31
 |archive-url  = https://web.archive.org/web/20170131151951/https://acc.dau.mil/adl/en-US/683694/file/75173/MIL-STD-882E%20Final%202012-05-11.pdf
 |url-status   = dead
}}</ref>

One of the most common fail-safe systems is the overflow tube in baths and kitchen sinks. If the valve sticks open, rather than causing an overflow and damage, the tank spills into an overflow. Another common example is that in an [[elevator]] the cable supporting the car keeps [[spring-loaded brake]]s open. If the cable breaks, the brakes grab rails, and the elevator cabin does not fall.

Some systems can never be made fail safe, as continuous availability is needed. For example, loss of engine thrust in flight is dangerous. Redundancy, fault tolerance, or recovery procedures are used for these situations (e.g. multiple independent controlled and fuel fed engines). This also makes the system less sensitive for the reliability prediction errors or quality induced uncertainty for the separate items. On the other hand, failure detection & correction and avoidance of common cause failures becomes here increasingly important to ensure system level reliability.<ref>{{cite book 
 |        last = Bornschlegl
 |       first = Susanne
 |       title = Ready for SIL 4: Modular Computers for Safety-Critical Mobile Applications
 |   publisher = MEN Mikro Elektronik
 |        year = 2012
 |         url = https://www.menmicro.com/downloads/search/dl/sk/%22White%20Paper%3A%20Ready%20for%20SIL4%3A%20Modular%20Computers%20for%20Safety-Critical%20Mobile%20Applications%22/dx/1/
 |      format = pdf
 |  access-date = 2015-09-21
 }}</ref>

==See also==
{{div col}}
* {{annotated link|ARP4761}}
* {{annotated link|Earthquake engineering}}
* {{annotated link|Effective safety training}}
* {{annotated link|Forensic engineering}}
* {{annotated link|Hazard and operability study}}
* {{annotated link|IEC 61508}}
* {{annotated link|Loss-control consultant}}
* {{annotated link|Nuclear safety}}
* {{annotated link|Occupational medicine}}
* {{annotated link|Occupational safety and health}}
* {{annotated link|Process safety}}
* {{annotated link|Reliability engineering}}
* {{annotated link|Risk assessment}}
* {{annotated link|Risk management}}
* {{annotated link|Safety life cycle}}
* {{annotated link|Zonal safety analysis}}

===Associations===
* {{annotated link|Institute of Industrial Engineers}}
*[http://www.system-safety.org/ International System Safety Society]
{{div col end}}

==References==

===Notes===
{{Reflist}}

===Sources===
* {{cite book|first=Frank|last=Lees|title=Loss Prevention in the Process Industries|edition=3|publisher=Elsevier|year=2005|isbn=978-0-7506-7555-0|author-link=Frank Lees}}
* {{cite book|first=Trevor|last=Kletz|title=Cheaper, safer plants, or wealth and safety at work: notes on inherently safer and simpler plants|publisher=I.Chem.E.|year=1984|isbn=978-0-85295-167-5|author-link=Trevor Kletz}}
* {{cite book|first=Trevor|last=Kletz|edition=3|title=An Engineer's View of Human Error |publisher=I.Chem.E.|year=2001|isbn=978-0-85295-430-0|author-link=Trevor Kletz}}
* {{cite book|first=Trevor|last=Kletz|edition=4|title=HAZOP and HAZAN |publisher=Taylor & Francis|year=1999|isbn=978-0-85295-421-8|author-link=Trevor Kletz}}
* {{cite book
  | first = Robyn R.
  | last = Lutz | author-link = Robyn Lutz
  | title = Software Engineering for Safety: A Roadmap
  | series = The Future of Software Engineering
  | publisher = ACM Press
  | year = 2000
  | url = http://www.cs.ucl.ac.uk/staff/A.Finkelstein/fose/finallutz.pdf
  | isbn = 978-1-58113-253-3
  | access-date = 31 August 2006
}}
* {{cite book
  | first1 = Lars
  | last1=Grunske
  | first2 = Bernhard
  | last2 = Kaiser
  | first3 = Ralf H.
  | last3 = Reussner
  | title=Component-Based Software Development for Embedded Systems
 | chapter = Specification and Evaluation of Safety Properties in a Component-based Software Engineering Process
  | publisher = Springer
  | year = 2005
  | pages=737–738
 | citeseerx = 10.1.1.69.7756
 | chapter-url= https://researchbank.swinburne.edu.au/file/51c018aa-fef1-4d08-a8ff-0f0d4237770d/1/PDF%20%28Accepted%20manuscript%29.pdf
 |series = Lecture Notes in Computer Science
 | volume=3778
 |doi = 10.1007/11591962_13
| isbn=978-3-540-30644-3
 }}
* {{cite book
  | author = US DOD
  | author-link = United States Department of Defense
  | title = Standard Practice for System Safety
  | publisher = US DOD
  | date = 10 February 2000
  | location = Washington, DC
  | url = http://www.faa.gov/regulations_policies/handbooks_manuals/aviation/risk_management/ss_handbook/media/app_h_1200.pdf
  | access-date = 7 September 2013
  | id = MIL-STD-882D }}
* {{cite book
  | author = US FAA
  | author-link = Federal Aviation Administration
  | title = System Safety Handbook
  | publisher =US FAA
  | date = 30 December 2000
  | location = Washington, DC
  | url = http://www.faa.gov/regulations_policies/handbooks_manuals/aviation/risk_management/ss_handbook/
  | access-date = 7 September 2013
}}
* {{cite book
  | author = NASA
  | url= http://nodis3.gsfc.nasa.gov/displayDir.cfm?Internal_ID=N_PR_8000_004A_
  | title = Agency Risk Management Procedural Requirements
  | publisher = NASA
  | author-link = NASA
  | id = NPR 8000.4A
  | date = 16 December 2008
}}
* {{cite book
  | first = Nancy
  | last = Leveson
  | title = Engineering a Safer World - Systems Thinking Applied To Safety
  | series = Engineering Systems
  | publisher = The MIT Press
  | year = 2011
  | url = http://sunnyday.mit.edu/safer-world/index.html
  | isbn = 978-0-262-01662-9
  | access-date = 3 July 2012
}}

== External links ==
* [http://www.apd.army.mil/jw2/xmldemo/p385_16/head.asp U.S. Army Pamphlet 385-16 System Safety Management Guide]

{{Systems Engineering}}
{{Occupational safety and health}}
{{Authority control}}

[[Category:Safety engineering| ]]
[[Category:Design for X]]
[[Category:Reliability engineering]]
[[Category:Engineering disciplines]]