Welcome to the Skunkware Audio/Video Tools section. Here you will
find sound card drivers, MPEG players and encoders, midi players, audio
CD players and mixers, and other
tools used for developing and enjoying multimedia presentations.
||MPEG audio player with HTTP support
||MPEG audio player
||Festival Speech Synthesis System
||Speech Synthesis System
||Edinburgh Speech Tools Library
||Text to Phoneme conversion
||Snd sound editor
||Audio CD player
||Motif audio mixer
||Vtcl audio mixer
||WAV audio player/editor
||MPEG video player
mpg123 - MPEG audio player
mpg123 reads one or more files (or standard input if -
is specified) or URLs and plays them on the audio device
(default) or outputs them to stdout. File/URL is assumed to
be an MPEG-1/2 audio bit stream.
In addition to reading MPEG audio streams from ordinary
files and from the standard input, mpg123 supports retrieval
of MPEG audio files via the HTTP protocol, which is used in
the World Wide Web (WWW). Such files are specified using a
so-called URL (universal resource location), which starts
with http://. When a file with that prefix is encountered.
mpg123 attempts to open an HTTP connection to the
server in order to retrieve that file to decode and play it.
It is often useful to retrieve files through a WWW cache or
so-called proxy. To accomplish this, mpg123 examines the
environment for variables named MP3_HTTP_PROXY, http_proxy
and HTTP_PROXY, in this order. The value of the first one
that is set will be used as proxy specification. To override
this, you can use the -p command line option (see the
OPTIONS section). Specifying -p none will enforce
contacting the server directly without using any proxy, even if
one of the above environment variables is set. Note that,
in order to play MPEG audio files from a WWW server, it is
necessary that the connection to that server is fast enough.
For example, a 128 kbit/s MPEG file requires the network
connection to be at least 128 kbit/s (16 kbyte/s) plus protocol
overhead. If you suffer from short network outages,
you should try the -b option (buffer) to bypass such
outages. If your network connection is generally not fast
enough to retrieve MPEG audio files in realtime, you can
first download the files to your local harddisk (e.g. using
lynx(1)) and then play them from there.
MPEG audio decoding requires a good deal of CPU performance,
especially layer-3. To decode it in realtime, you should
have at least a Pentium, Alpha, SuperSparc or equivalent
processor. You can also use the -singlemix option to decode
mono only, which reduces the CPU load somewhat for layer-3
streams. See also the -2 and -4 options. If everything
else fails, use the -s option to decode to standard output,
direct it into a file and then use an appropriate utility to
play that file. You might have to use a tool such as sox(1)
to convert the output to an audio format suitable for your
audio player. Also note that mpg123 always generates 16 bit
stereo data (if one of the -single* options is used, two
identical stereo channels are generated). If your hardware
requires some other format, for example 8 bit mono, you also
have to use a converter such as sox(1). If your system is
generally fast enough to decode in realtime, but there are
sometimes periods of heavy system load (such as cronjobs,
users logging in remotely, starting of big programs
etc.) causing the audio output to be interrupted, then you
should use the -b option to use a buffer of at least 1000
maplay - MPEG audio player
maplay version 1.2 is the second release of my MPEG audio player/decoder.
It decodes layer I and layer II MPEG audio streams and plays them
using a CD-quality audio device. Currently supported devices are the
dbri device of SPARC 10 computers and the audio ports of Silicon
Graphics Indigo machines. Thanks to Louis P. Kruger (lpkruger@phoenix.
Princeton.EDU), maplay 1.2 can also use the /dev/dsp device under Linux.
Louis has tested it with the Pro Audio Spectrum 16 soundcard. Sound Blaster 16
and Gravis Ultrasound cards should also work, but a bug in the dsp driver
prevents stereo playback on Gravis Ultrasound cards. An amd device of a
SPARC 2/IPX/... machine can be used, too, but this device is only capable of
producing audio output at 8 kHz in u-law format, which sounds like transmitted
through a telephone. Other audio device are not supported directly, but can be
used with the "decode to stdout" option and an audio format converter.
Besides it should not be a problem to adapt the program to other audio devices.
The player supports all modes, which are single channel, stereo,
joint stereo and dual channel, and all bitrates except free mode.
The missing free mode support should not be a problem for now,
because I have not seen such a stream yet.
maplay needs approximately 46% CPU time on SPARC 10/40 machines and 50%
on Indigos for realtime stereo playback of a 44.1 kHz 128 kbit/s stream.
Single channel playback needs about the half CPU time. On a SPARCstation IPX,
maplay needs about 43% CPU time for realtime mono playback. Stereo playback
is not possible via an amd device.
Besides realtime playing of audio streams, maplay can decode streams to
stdout for further conversions. The output consists of 16 bit signed PCM
values. For stereo streams, the values are interleaved, which means that
a value for the left channel is followed by a value for the right channel
and so on. If maplay has been compiled for u-law output, the output consists
of 8 bit u-law samples at a rate of 8 kHz, no matter what frequency the stream
The Festival Speech Synthesis System
Festival offers a general framework for building speech synthesis
systems as well as including examples of various modules. As a whole
it offers full text to speech through a number APIs: from shell level,
though a Scheme command interpreter, as a C++ library, and an Emacs
interface. Festival is multi-lingual (currently English, Welsh and
Spanish) though English is the most advanced.
The system is written in C++ and uses the Edinburgh Speech Tools
for low level architecture and has a Scheme (SIOD) based command
interpreter for control. Documentation is given in the FSF texinfo
format which can generate, a printed manual, info files and HTML.
The MBROLA Speech Synthesis System
MBROLA v3.00 is a speech synthesizer based on the concatenation of
diphones. It takes a list of phonemes as input, together with prosodic
information (duration of phonemes and a piecewise linear description
of pitch), and produces speech samples on 16 bits (linear), at the
sampling frequency of the diphone database.
It is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
not accept raw text as input. In order to obtain a full TTS system,
you need to use this synthesizer in combination with a text processing
system that produces phonetic and prosodic commands. The Skunkware MBROLA
distribution is pre-configured for use in conjunction with the Festival
Speech Synthesis system as well as the txt2pho and emofilt utilities. These
tools provide support for TTS synthesis, Text-to-Phoneme conversion, and
manipulation of prosody of text-to-speech output.
There is currently only an SCO OpenServer 5 binary which works on both
OpenServer and UnixWare 7.
The Edinburgh Speech Tools Library
The Edinburgh Speech Tools Library is a collection of C++ class, functions
and related programs for manipulating the sorts of objects used in speech
processing. It includes support for reading and writing waveforms, parameter
files (LPC, Ceptra, F0) in various formats and converting between them. It
also includes support for linguistic type objects and support for various
label files and ngrams (with smoothing).
In addition to the library a number of programs are included. An intonation
library which includes a pitch tracker, smoother and labelling system (using
the Tilt Labelling system), a classification and regression tree (CART)
building program called wagon. Also there is growing support for various
speech recognition classes such as decoders and HMMs.
The Edinburgh Speech Tools Library is not an end in itself but desgined to
make the construction of other speech systems easy. It is for example to
provided the underlying classes in the Festival Speech Synthesis System
The speech tools are currently distributed in full source form free for
The following c++ programs are available:
The following C++ sub-libraries are available
- na_play: generic playback program for use with net_audio and CSTR ao.
- ch_wave: Waveform file conversion program.
- ch_lab: label file conversion program.
- ch_track: Track file conversion program.
- wagon: a CART tree build and test program
- And others
- audio: C++ audio functions for Network Audio system, Suns
OpenServer, UnixWare, Linux and FreeBSD
- speech_class: C++ speech classes, including waveform and track.
- ling_class: C++ linguistic classes.
- sigpr: Signal processing
- utils: Various utilities.
TTS front end for the MBROLA synthesizer
Txt2pho is a German TTS front end for the MBROLA synthesizer.
This program is derived from the speech synthesis system Hadifix.
Currently there are no UnixWare or OpenServer binaries available but
the freely available Linux binary works when used in conjunction with
the Linux Emulation System developed by SCO,
Open Sound System sound card drivers
Open Sound System for SCO OpenServer and SCO UnixWare
provides device drivers for popular soundcards under
SCO Open Server 5, SCO UnixWare 2.x, UnixWare 7 and Free/SCO.
OSS/SCO comes with a configuration tool and complies with the
Open Sound System API.
Features include :
- Easy menu based installation and configuration program
- Support for Sound Blaster AWE32/AWE64 Emu8000 synth
- Supports PnP sound cards.
- Autodetection of Sound Blaster, ESS and GUS sound cards.
- Drivers for over 150 brand name soundcards and onboard audio devices.
- Support for a wide variety of audio applications
- Support for select()
- Support for "Virtual Mixer" - play 8 simultaneous audio apps!
- Support for OPL3-SAx, AD1816, CMI8330 and Sound Blaster AWE64 PCI
- Support for S3 Sonic Vibes/Turtle Beach Daytona
- Full Duplex support for Sound Blaster 16/AWE-32/AWE-64 and Vibra16
- Support for Ensoniq AudioPCI and AudioPCI97
OSS/SCO version 980728 now available (August 14, 1998) for OpenServer and
OSS/UnixWare version 3.9 BETA announced (April 17, 1998) for UnixWare.
SCO Skunkware was the initial SCO distribution mechanism for these drivers.
The OSS audio drivers are being incorporated into the standard product line
(beginning with UnixWare 7 and soon with OpenServer). As this transition takes
place, the Skunkware audio pages will attempt to direct you to the best
place to download the current driver for your platform(s). Currently, the
best place to get the OSS audio driver(s) is from 4Front Technologies as
they provide the latest bug-fixed release sooner than SCO is able to integrate
it into their product line. Unfortunately, the 4Front drivers are not free.
SCO will continue to provide free fully-functional SoundBlaster compatible
OSS drivers on-line and in the product. Whew.
The download page at 4Front Technologies is
http://www.4front-tech.com/download.cgi. The 4Front Technologies
OpenServer page is at
http://www.4front-tech.com/sco.html and the 4Front UnixWare page is
In addition, 4Front maintains a pretty good set of links to free audio
The links below will attempt to take you to the latest (free, fully functional)
SCO pre-licensed drivers. Hopefully these locations will stabilize over time.
Snd - sound editor
Snd is a freeware sound editor modelled loosely after Emacs and an old,
sorely-missed PDP-10 sound editor named Dpysnd. It is an X/Motif application
written by Bill Schottstaedt (email@example.com).
It can accomodate any number of sounds at once, each with any number of
channels. Each channel is normally displayed in its own window, with its own
cursor, edit history, and marks; each sound has a control panel to try out
various changes quickly, and an expression parser, used mainly during searches;
there is an overall stack of regions that can be browsed and
edited; channels and sounds can be grouped together during editing; edits can
be undone and redone without restriction (unlimited undo)
SoX is a sound file format converter for Unix and DOS PCs written by Lance
Norskog and other invaluable contributors. It also does sample rate conversion
and some sound effects. It is the swiss army knife of sound tools: the interface
is not great, but it does almost everything.
SoX uses file suffices to determine the nature of a sound sample file.
If it finds the suffix in its list, it uses the appropriate read or write
handler to deal with that file. SoX has an auto-detect feature that attempts
to figure out the nature of an unmarked sound sample.
TiMidity - MIDI to WAVE converter and player
TiMidity is a MIDI to WAVE converter using Gravis
Ultrasound-compatible patch files to generate digital audio
data from General MIDI files. The data can be stored in a
file for processing, or played in real time through an audio
Motif CD Audio Player
Xmcd is CD Player utility package including xmcd, a CD Player for the X window
system using the Motif graphical user interface and cda, a command-line driven,
text mode CD Player which also features a curses-based, screen-oriented mode.
Both utilities transform your CD-ROM or CD-R drive into a stereo CD player,
allowing you to play music CDs on your computer.
These CD player utilities are designed to be attractive, feature-rich yet intuitive to
use, and takes advantage of many CD-ROM drive capabilities that are not accessible
via other players. Moreover, a CD database feature is supported, maintaining the disc
artist/title, track titles, and arbitrary text (such as band information and song lyrics).
Xmcd and cda have emerged as the most ported CD player package, supporting a
substantial list of UNIX operating system variants (as well as a non-UNIX OS) and
hardware platforms. Moreover, these utilities also support a vast spectrum of
CD-ROM and CD-R drives, including many older SCSI-1 units.
Motif Audio Mixer
Xmmix is an audio mixer utility for the X window system using the Motif
graphical user interface. It operates the input and output mixer section
on many PC sound cards.
Vtcl mixer front-end
Xmixer is a Visual TCL (vtcl) script written by John Gray (firstname.lastname@example.org)
which acts as a graphical front-end to the mixer program thus providing an
easy-to-use and simple way to control the mixing of your sound card.
/usr/local/bin/Xmixer [ linear ] [ gang ] [ notitle ] [ help ]
linear slider control, default is log
gang slider control, default is separate
notitle No L/R labels on sliders, default is labels
help This message, default is no message
audio editor, player, recorder
Xwave supports editing of large files, cut,copy,paste,merge,
some effects (echo,reverse,swap channels,resample,volume),
supports RIFF,AIFF,AIFC,AU SCO, Linux,SGI,SUN,FreeBSD
encodes MPEG-1 bitstreams
mpeg_encode produces an MPEG-1 video stream. param_file is
a parameter file which includes a list of input files and
play mpeg-1 encoded bitstreams
mpeg_play decodes and displays mpeg-1 encoded bitstreams on
systems running X11. The player will create a new window,
display the bitstream, and exit.
Caldera International, The Santa Cruz Operation, Inc. and
Caldera Skunkware are not related to, affiliated with or licensed by the famous
Lockheed Martin Skunk Works (R), the creator of the F-117 Stealth Fighter,
SR-71, U-2, Venturestar(tm), Darkstar(tm), and other pioneering air and
Last Updated: Friday Jul 06, 2001 at 15:00:20 PDT