Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Soft hyphen
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Unicode character}} [[File:IEC 60417 - Ref-No 6073.svg|thumb|right|ISO symbol for soft hyphen]] In computing and typesetting, a '''soft hyphen''' (Unicode {{unichar|00AD|soft hyphen|html=}}) or '''syllable hyphen''', is a code point reserved in some [[coded character set]]s for the purpose of breaking words across lines by inserting visible [[hyphen]]s if they fall on the line end but remain invisible within the line. Two alternative ways of using the soft hyphen character for this purpose have emerged, depending on whether the encoded text will be broken into lines by its recipient, or has already been preformatted by its originator.<ref name="tut">{{cite web| title= Soft hyphen (SHY) β a hard problem?| url= https://jkorpela.fi/shy.html | author=Jukka Korpela | publisher= [[Tampere University of Technology]] | date= January 2011 | access-date= 2011-04-08}}</ref><ref name="kuhn">{{cite web| title= Unicode interpretation of SOFT HYPHEN breaks ISO 8859-1 compatibility| url= https://www.unicode.org/L2/L2003/03155r-kuhn-soft-hyphen.pdf | author=Markus G. Kuhn | date= 2003-06-04 | series= [[Unicode Technical Committee]] | id= L2/03-155R| author-link= Markus G. Kuhn }}</ref><ref name="muller">{{cite web| title= Yes, SOFT HYPHEN is a hard problem| url= https://www.unicode.org/L2/L2002/02279-muller.htm | author=Eric Muller | date= 2002-08-14 | series= [[Unicode Technical Committee]] | id= L2/02-279}}</ref> == Text to be formatted by the recipient == The use of SHY characters in text that will be broken into lines by the recipient is the application context considered by the post-1999 [[HTML]] and [[Unicode]] specifications, as well as some word-processing file formats. In this context, the soft hyphen may also be called a '''discretionary hyphen''' or '''optional hyphen'''. It serves as an invisible marker used to specify a place in text where a hyphenated break is allowed without forcing a [[Line wrap|line break]] in an inconvenient place if the text is re-flowed. It becomes visible only after [[word wrapping]] at the end of a line.<ref>{{Cite web |title=CSS Text Module Level 3 Specification |url=https://www.w3.org/TR/css-text-3/#hyphenation |access-date=2022-08-07 |website=W3C Candidate Recommendation Draft |publisher=World Wide Web Consortium (W3C)}}</ref> The soft hyphen's Unicode semantics and HTML implementation are in many ways similar to Unicode's [[zero-width space]], with the exception that the soft hyphen will preserve the [[kerning]] of the characters on either side when not visible. The zero-width space, on the other hand, will not, as it is considered a visible character even if not rendered, thus having its own kerning metrics. To show the effect of a soft hyphen in HTML, the words of the following text (from the poem ''Spring and Fall'' by [[Gerard Manley Hopkins]]) have been separated with soft hyphens: <blockquote><div style="background:#f7f7f7; width:70%; margin:auto;"> Margaret­Are­You­Grieving­Over­Goldengrove­Unleaving­Leaves­Like­The­Things­Of­Man­You­With­Your­Fresh­Thoughts­Care­For­Can­You­Ah­As­The­Heart­Grows­Older­It­Will­Come­To­Such­Sights­Colder­By­And­By­Nor­Spare­A­Sigh­Though­Worlds­Of­Wanwood­Leafmeal­Lie­And­Yet­You­Will­Weep­And­Know­Why­Now­No­Matter­Child­The­Name­Sorrows­Springs­Are­The­Same­Nor­Mouth­Had­No­Nor­Mind­Expressed­What­Heart­Heard­Of­Ghost­Guessed­It­Is­The­Blight­Man­Was­Born­For­It­Is­Margaret­You­Mourn­For </div></blockquote> On HTML browsers supporting soft hyphens, resizing the window will re-break the above text only at word boundaries, and insert a hyphen at the end of each line. == Text preformatted by the originator == The SHY character is also used in text where paragraphs have already been broken into lines, such as certain [[plain text]] files, text sent to [[VT100]]-style [[terminal emulator]]s or printers, or pages represented in [[page description language]]s. This is the application context originally considered by the [[EBCDIC]] and [[ISO 8859-1]] standards and implemented in many [[VT100]] [[terminal emulator]]s.<ref name="tut"/><ref name=kuhn/> Here, SHY is a visible hyphen that is usually visually indistinguishable from a regular hyphen, but has been inserted solely for the purpose of line breaking. The purpose of the soft hyphen here is to distinguish it from any regular hyphen that might have been part of the original spelling of the word. This distinction helps re-use of already formatted text, when line breaks and soft hyphens inserted during word wrapping have to be removed to convert the text back into its unformatted form. For example, the copy or paste function of a [[terminal emulator]] can offer to replace line breaks with a [[space character]], and remove any soft hyphens including any immediately following [[whitespace character]]s. An example application that outputs soft hyphens for this reason is the [[groff (software)|groff]] text formatter as used on many Unix/Linux systems to display [[man pages]]. == Encodings and definitions == {{anchor|OSC}}Soft hyphen (''SHY'') characters in coded characters sets, roughly in chronological order: * [[EBCDIC]] placed a SHY character (known there as a "syllable hyphen") at position 202 (0xCA [[hexadecimal]]).<ref name="tut"/><ref>{{cite web| title= Extended Binary-Coded Decimal Interchange Code - S/390 |url= http://www.comsci.us/datacom/ebcdic3.html | publisher= comsci.us| access-date= 2011-04-08}}</ref> IBM defined its purpose as a "hyphen used to divide a word at the end of a line [that] may be removed when a program adjusts lines."<ref>{{cite web| title= Glossary |url= http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/rzaat/rzaats.htm#x2047006 | publisher= [[IBM]] | access-date= 2011-04-08}}</ref> * German standard [[DIN]] 31626 defined a [[ISO/IEC 2022#Control character sets|C1 control code set]] defining 0x8D as an "Optional Syllabification Control (OSC)", a "print control character" for use marking syllable boundaries in long words. This C1 control set was registered in 1979.<ref>{{cite iso-ir |number=40 |title=Additional Control Functions for Bibliographic Use according to German Standard DIN 31626 |sponsor=DIN |sponsor-link=DIN |date=1979-07-15}}</ref> (Note: this is not the same as the [[C0 and C1 control codes#C1 controls|ISO/IEC 6429]] C1 control code {{Control code link|ANSI:OSC|Operating System Command (OSC)}}.) * [[ISO 8859-1]]:1986 (Latin 1) inherited SHY from EBCDIC, but called it "soft hyphen", placed it at position 0xAD (hexadecimal), and stated its purpose as "for use when a line break has been established within a word". Other [[ISO 8859]] parts placed it at the same position, with the exception of [[ISO 8859-11]] (Latin/Thai), which lacks it. * IBM [[code page 850]] (an [[MS-DOS]] character set covering all ISO 8859-1 characters) placed it at position 240 = 0xF0. * [[SGML]]'s "Numeric and Special Graphic" (isonum) [[character entity]] set (ISO 8879:1986) includes <code>&shy;</code> for the ISO 8859-1 soft hyphen. * Unicode 1.0 (1991) and ISO 10646 (1993) took the first 256 code positions from ISO 8859-1, resulting in SHY at Unicode code point of U+00AD. * [[HTML]] 2 (1995) incorporated the "&shy;" character entity from SGML, but explicitly discouraged its use. * HTML 4 (1999) redefined the purpose of the character as marking a hyphenation opportunity, which only becomes visible as a hyphen at the end of a line after formatting. * Unicode 4.0 (2002) changed the category of its SHY character from previously "Pd" (punctuation, dash) to "Cf" (other, format), thereby aligning its interpretation of the character with that of HTML 4. Other commands for marking hyphenation opportunities in text formatting languages (similar to the HTML 4 and Unicode 4.0 interpretation of SHY): * [[troff]] and [[groff (software)|groff]]: <code>\%</code>. * [[TeX]] and [[LaTeX]]: <code>\-</code><ref>{{cite web| title= Commonly Confused Characters | url=http://www.cs.sfu.ca/~ggbaker/reference/characters/#dash | publisher= Greg Baker, [[Simon Fraser University]] | access-date= 2011-07-12}}</ref> ==Security issues== Soft hyphens, like other invisible characters, have been used to obscure malicious [[domain name|domains]] or [[URL]]s in [[e-mail spam]].<ref>{{cite web| title= Spammers Using Soft Hyphen To Hide Malicious URLs |url= http://it.slashdot.org/story/10/10/07/2127241/Spammers-Using-Soft-Hyphen-To-Hide-Malicious-URLs | date= 7 October 2010 | publisher= [[Slashdot]] | access-date= 2011-04-08}}</ref><ref>{{cite web| title= Soft Hyphen β A New URL Obfuscation Technique |url= https://community.broadcom.com/symantecenterprise/communities/community-home/librarydocuments/viewdocument?DocumentKey=3f86fd1d-ca17-4e5b-b75e-88dd2c6f0c2d&CommunityKey=1ecf5f55-9545-44d6-b0f4-4e4a7f5f5e68&tab=librarydocuments | publisher= [[NortonLifeLock|Symantec]] | access-date= 2011-04-08}}</ref> They are also used in emails to try to defeat spam prevention systems. For example, the phrase "I need your assista­nce discreetly" has a soft hyphen in the word assistance which may mean a mail system would not detect the phrase in the email body.{{cn |date=June 2024}} ==See also== *[[Hard hyphen]] *[[Non-breaking space]] *[[Word divider]] *[[Word joiner]] *[[Zero-width space]] *[[Word wrap]] ==References== {{reflist}} {{Use dmy dates|date=November 2018}} {{Unicode navigation}} [[Category:Punctuation]] [[Category:Typography]] [[Category:Control characters]] [[Category:Whitespace]] [[Category:Unicode formatting code points]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Anchor
(
edit
)
Template:Cite iso-ir
(
edit
)
Template:Cite web
(
edit
)
Template:Cn
(
edit
)
Template:Control code link
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)
Template:Unichar
(
edit
)
Template:Unicode navigation
(
edit
)
Template:Use dmy dates
(
edit
)