I’ve been waiting a long time to show an example of what gets me up in the morning… Because in this world, failure isn’t the end — it’s the start of real insight! 💥 During hot-fire testing, an additively manufactured GRCop-42 combustion chamber failed — and with it, offered a powerful #FailureAnalysis case study on the critical role of process rigor in additive manufacturing, especially when builds are interrupted. We conducted a full failure analysis: reviewing test day data, manufacturing records, post-processing steps, and metallurgical characteristics of both the failed chamber and adjacent components. 🔬 Key findings: • Failure occurred at a build interruption location, witness line, with metallographic analysis revealing higher porosity than expected. • This localized porosity reduced tensile strength and elongation, triggering the failure. • Interestingly, test bars with emulated build interruptions showed no performance degradation — confirming that proper restart procedures preserve part integrity. Additive manufacturing offers incredible promise, but as this work shows, it also demands discipline. Especially when the stakes are rocket engines. 🔗 Full article: https://lnkd.in/ekg-t4MH Ben Williams, Colton Katsarelis, Will Tilson, and Paul Gradl, thank you for the collaboration in making this fun analysis and article! #AdditiveManufacturing #RocketEngines #FailureAnalysis #MaterialsScience #GRCop42
Analyzing Engineering Failures For Future Success
Explore top LinkedIn content from expert professionals.
Summary
Analyzing engineering failures for future success involves studying past incidents to uncover root causes and prevent similar problems in future designs or processes. This approach helps improve safety, reliability, and efficiency across industries.
- Investigate thoroughly: Conduct detailed analyses of failures, including material properties, design flaws, and operational conditions, to pinpoint the true causes.
- Learn from near-misses: Treat near-misses with as much scrutiny as failures to identify vulnerabilities and improve systems before disasters occur.
- Promote safety culture: Prioritize transparent communication and encourage reporting of potential risks to build trust and prevent critical oversights.
-
-
#NASA & #BOEING Notes Both have experienced significant controversies & tragedies, primarily in their respective fields of aviation and space exploration. B: The Boeing 737 MAX controversy involved critical design flaws in the Maneuvering Characteristics Augmentation System (MCAS), which led to two fatal crashes. N: The Space Shuttle Challenger disaster in 1986 was caused by an O-ring failure in the solid rocket booster, a known design flaw that was critically exacerbated by cold weather. B: Investigations into the 737 MAX incidents revealed lapses in oversight, where Boeing reportedly downplayed the complexity and risks associated with the MCAS to regulators. N: Both the Challenger and the Columbia disaster (2003) were linked to management lapses where warnings from engineers about potential fatal issues were overlooked by higher-ups, pushing ahead with launches under risky conditions. There have been reported instances of a compromised safety culture where economic or political pressures overshadowed safety concerns. Reports and investigations post-disasters pointed to environments where the escalation of safety concerns was discouraged. B: After the 737 MAX crashes, Boeing faced intense scrutiny from the U.S. Congress, the FAA, and other international regulatory bodies, questioning the initial certification processes. N: Each major NASA tragedy led to comprehensive reviews by governmental oversight bodies, leading to significant changes in operational and safety procedures. B: The 737 MAX crashes severely damaged Boeing's reputation, leading to a financial impact, loss of trust among the public and airlines, and a halt in 737 MAX production and deliveries. N: Fatalities and the resulting investigations typically led to temporary halts in space missions, revaluations of protocols, and a long-term impact on the operational practices and safety measures within NASA. Teaching: Analyze specific cases like the 737 MAX & the Challenger disaster. Discuss ethical responsibilities of engineers + management in these scenarios. How decisions were made, including the role of economic pressure & the ethical dilemmas faced by engineers & executives. Examine how pressures for meeting schedules & budgets can compromise safety measures. Discuss strategies for creating a strong safety culture where safety concerns are prioritized and valued. Study the role of the FAA in the Boeing cases & NASA oversight committees in space shuttle disasters. Debate current regulatory practices & suggest potential improvements based on historical shortcomings. Look at the long-term changes implemented to prevent future incidents, such as changes in engineering practices, management approaches, + regulatory. How high-stress environments & high stakes can affect psychology & team dynamics. Evaluate how these incidents affect public trust in institutions. Discuss the importance of transparency and honest reporting in maintaining public trust.
-
I reflected on a series of interactions on X around the blackout in Spain and Portugal. I was thinking about root cause analysis and the timing required if I were given the project. I think that 16 weeks is a reasonable timeframe for an assessment with recommended solutions to avoid the issue. 1. Define the Failure Clearly (1–2 weeks) - Insist on clarity of the problem. This means documenting exactly what happened—when, where, and how the grid failed: - Time and geographic extent of the outage - Cascading effects on infrastructure - Which frequency, voltage, or phase parameters deviated - Immediate technical symptoms 2. Gather Raw, Unfiltered Data (1–4 weeks) - Avoid relying on polished managerial summaries and instead ask engineers for: - Real logs and sensor data (frequency, load shedding, transmission trips) - Physical conditions (e.g., grid weather conditions, equipment specs) - People's firsthand accounts, especially from operators and field engineers 3. Probe for Systemic Design and Organizational Flaws (3–6 weeks) - I remember Feynman studying the Space Shuttle O-ring failure. I'd explore: - Structural vulnerabilities (e.g., renewable intermittency, islanding problems) - Control system limits (SCADA failures, PLC/RTU coordination issues) - Organizational behavior — misaligned incentives, regulatory complacency, or "normalization of deviance" Specially question assumptions such as: “The grid is secure under X% renewables” "The grid is secure under Y% rotating equipment" “Our simulations already cover these failure modes” "It couldn't be Z" 4. Simulate and Reconstruct the Sequence (2–3 weeks) - Can we empirically test? Reconstruct grid behavior using load flow, transient stability, or EMT simulations. Inject perturbations to see if the failure recurs. Validate assumptions about what “should have happened” vs. what actually happened? 5. Deliver Honest Conclusions (1–2 weeks) “The first principle is that you must not fool yourself — and you are the easiest person to fool.” - Richard Feynman
-
Your "best practices" might be killing your improvement efforts. Here's why: During WWII, military analysts studied planes returning from combat to see where to add armor. Most bullet holes were on the wings and fuselage. So they wanted to reinforce those areas. Then statistician Abraham Wald said: "You're looking at the wrong data." The planes you're studying SURVIVED. The ones shot in the engine and cockpit? They never made it back. This is survivorship bias. And it's destroying your continuous improvement efforts. Here's how: - You study your "successful" processes - You benchmark against top performers only - You ignore the failed experiments - You copy what worked elsewhere But you're missing the critical data: → Why did some improvement initiatives fail? → What problems aren't being reported? → Which "best practices" actually caused failures? → What are the unsuccessful companies doing wrong? The manufacturing reality: For every process improvement that worked, 3 didn't make it to implementation. But we only study the survivors. Better approach: - Document failed experiments and why they failed - Study processes that broke down under pressure - Interview people who left your company - Analyze near-misses, not just successes - Look at what your struggling competitors are doing The real insights aren't in your success stories. They're in your failures. What failed improvement initiative taught you the most? Share it below - let's learn from the data we usually ignore.
-
Downhole Tool Failures - Root Cause Analysis We all know that there are multiple potential culprits to just about every NPT event that happens on a rig. Operators have been continuously striving for lower costs & faster drilling, my question to our US Land based drilling industry is, why do we continue to ignore the things that can be controlled that can lead to lower costs & faster drilling? It's not a trick question... It's an interesting dynamic that as an industry we know that the removal of drilled solids makes for a better wellbore, lowers costs, reduces waste, reduces NPT, fewer downhole tool failures, reduces risk, etc. Yet, we continue to pay for a centrifuge and not run it, we continue to run fine mesh screens on a drying shaker and course mesh screens on a primary shaker. We continue to carry massive amounts of drilled solids from well to well, never removing them, just watch them degrade to smaller and smaller particles size. Which gets me to my next conundrum. As these particles get smaller and the ultrafine, colloidal particles start to sandblast rubbers & elastomers in an MWD tool to the point of failure, who is ultimately responsible? It appears to me that the responsibility should reside with the mud engineering service to advise on best practices. If there is a lack of knowledge or understanding from anyone on the rig, there should be more support from the office to help educate the field engineers on what is good drilling fluid management. The mud companies need to hold education sessions for drilling supervisors on the merits of good drilling fluid practices. The bottom line is that good engineering service over the life-cycle of that mud system is not that obvious when the centrifuge run time is 1 hour per day or running 170 mesh screens on a drying shaker and the D50 PSD is >5 micron. This should tell the drilling engineers & drilling managers that inadequate mud engineering service could be deemed the culprit to a junked motor.....not the motor company. Digitizing this process, bringing transparency and accountability across all the various workflows is where the value resides. The dissemination of real time data across the entire value chain is when the next big win is going to come from. Measuring Mud Matters
-
Root cause analysis tools can add valuable pieces to the failure analysis puzzle. I completed a failure analysis on some polycarbonate components, which identified brittle fracture related to environmental stress cracking (ESC). The chemical associated with the failures was identified as hexane, which is used on the components in the manufacturing process. The cracking had been observed shortly after this assembly step. Historically, this had not presented a problem, however, over a range of several weeks, a rash of failures had been encountered. A review of assembled parts showed a failure rate of approximately 25%. The parts are molded from a polycarbonate having a nominal melt flow rate (MFR) of 20 g/10 min. Testing conducted as part of the failure analysis indicated that the failed parts had a significantly higher MFR than the molding resin, and further identified significant part-to-part variation. After conducting the failure analysis, and identifying the mode (ESC), and cause of the failure (low molecular weight), root cause analysis (RCA) tools were employed to dig deeper. Data generated as part of quality control testing was analyzed. MFR testing had been conducted on random components sampled from the molding operation. Unfortunately, this data was not reviewed after collection. The data analysis included two root cause analysis (RCA) that added key pieces to the puzzle. A histogram was created showing the MFRs of the molded parts over the time period in which the cracking had been observed. The form of the histogram was not a normal distribution, and suggested a bimodal distribution. The MFR data was further reviewed and plotted using a box and whisker plot. This showed a significant difference between 1st shift and 2nd shift operations. The MFR results from the 1st shift production showed an average that demonstrated an acceptable shift, corresponding to minimal molecular degradation through the injection molding process. The data from the 2nd shift was significantly higher and indicated excessive molecular degradation. Further, the 1st shift data showed a relatively tight distribution compared with the 2nd shift data, suggesting control and out-of-control process conditions, respectively. This data review pointed to differences between the production of the parts. Further digging revealed “discrepancies” within the resin dying procedure and the analytical technique used to measure the moisture content prior to releasing the resin to the molding floor. Root cause analysis techniques, such as these, often provide critical information to identify the true underlying cause of a failure. They are best directed by the results of a thorough failure analysis. Contact me to discuss how root cause analysis (RCA) could help you identify problems in your plastic components, jeff@madisongroup.com #failureanalysis #rootcauseanalysis #rca #plastics #mfr #injectionmolding #polycarbonate The Madison Group | Plastic Consulting
-
In this case, the containership’s engine malfunction is a stark reminder of how close we can come to a serious accident. Quick thinking saved the day, but near-misses like this one demand attention—not just applause for skillful recovery. Imagine if the nearby tugs had been unavailable, or if the angle adjustment had failed. The vessel, cargo cranes, and other assets were within mere moments of a disaster. But let’s focus on the lesson we can take away here: why did the engine fail, and why wasn’t the issue detected earlier? In my years of experience, I’ve seen near-misses treated as "just bad luck." But luck isn’t a safety strategy. If this has become a more common occurrence, as stated, we need to ask ourselves: How often are we inspecting our propulsion systems? Are our indicators reliable? Are emergency procedures regularly drilled? A malfunction at sea can have cascading consequences—financial, operational, and, more importantly, human. We need to ensure that the systems and people aboard these vessels are prepared for the worst. Let’s make sure we’re not simply waiting for the next “quick thinking” to save us from disaster. We must address the root causes and prevent these situations from arising in the first place. #maritime #ports #terminals #maritimesafety #nearmiss #propulsion #pms #operationalexcellence #riskmanagement #enginefailure #processsafety #safetyfirst #incidentprevention #continuousimprovement #lessonslearned #humanerror #rootcauseanalysis #shipoperations #criticalthinking