VoIP
From Computing and Software Wiki
m (→Analog to Digital Conversion) |
m (→Audio Codecs) |
||
Line 48: | Line 48: | ||
#Quantization: This refers the number of bits used to represent each sample. Audio signals are best represented by 16 bit samples. | #Quantization: This refers the number of bits used to represent each sample. Audio signals are best represented by 16 bit samples. | ||
#Sampling rate: This refers to the number of samples per second used to encode the sound. 8 kHz is the most common sampling frequency used to VoIP communication. | #Sampling rate: This refers to the number of samples per second used to encode the sound. 8 kHz is the most common sampling frequency used to VoIP communication. | ||
+ | |||
+ | |||
==Audio Codecs== | ==Audio Codecs== |
Revision as of 20:44, 12 April 2009
Voice over Internet Protocol (VoIP) is a general term for a family of technologies that convert analog audio signals (voice) to digital data that is then transmitted over IP networks such as the Internet or other packet switched networks. VoIP technology is sometimes also called IP telephone, Internet telephony, voice over broadband (VoBB), broadband telephone and broadband phone.
The major benefits arising from VoIP technology is the reduction in communication and infrastructure costs. Routing phone calls over existing data networks eliminate the need for creating separate data and voice networks. This reduces the infrastructure costs of companies. VoIP features such as conference calling and caller ID which usually cost extra with most telecommunication companies are available for free with most VoIP services. The average consumer can eliminate long distance charges altogether by using a service such as Skype to communicate with relatives.
Contents |
Types of VoIP Calls
Analog Telephone Adapter
The Analog Telephone Adapter is a device used to connect a standard analog telephone to a computer network so that the user can make voice calls over the internet. ATA devices usually come in two forms. In a household setting, it usually has one or more RJ-11 jacks to connect to a telephone and a USB connector to connect to a computer, laptop or hand-held device. This type of ATA requires the service of computer software to digitize voice data and communicate with the VoIP server. In an enterprise setting, the ATA usually consists of multiple telephone jacks and an RJ-45 connector to connect to an Ethernet hub or switch. This type of ATA contains its own analog to digital converters and uses protocols such as H.323 and SIP to communicate with the VoIP server. Some internet service providers give an ATA to their customers as part of their package. A flat rate for this is typically cheaper than traditional phone service plans.
VoIP Phone
VoIP Phones are specialized phones that allow a direct connection to the IP Network via Ethernet of Wi-Fi technology. Therefore, they allow the use of VoIP without the use of computer. There are two types of VoIP Phones: IP Phones and Wi-Fi phones. IP Phones have a different connector than regular phones. This connector (RJ-45 connector) allows it to link directly with the router via an Ethernet cable. Wi-Fi phones allow the VoIP calls with connection from any Wi-Fi hotspot. The Cisco IP Phone is a common example of an IP Phone that is used in many companies.
Softphone
A Softphone is VoIP software installed on a computer and used in conjunction with the computers microphone, speaker, sound card and internet connection. It eliminates the need for dedicated hardware. It is important to distinguish between softphones and the services softphone technology. Skype is a common example of this service. They provide free local calls and member to member calls. This can reduce an average consumers long distance charge considerably. It also provides the advantage of having a fixed contact number regardless of the users location. Other services include GoogleTalk and Vonage
How VoIP Works
Stages of VoIP transmission
- First the analog voice must be converted to a digital signal via an analog-to-digital converter
- The digital signal must be compressed for transmission
- The voice packets are then inserted into data packets using a real-time protocol (RTP)
- A signaling protocol is then required to call a user
- When the packet reaches the destination it must be disassembled
- The data is then extracted or decompressed
- The digital signal is then converted back to an analog voice signal via a digital to analog converter
Analog to Digital Conversion
Two factors play an important role in the successful digitization of an analog voice signal:
- Quantization: This refers the number of bits used to represent each sample. Audio signals are best represented by 16 bit samples.
- Sampling rate: This refers to the number of samples per second used to encode the sound. 8 kHz is the most common sampling frequency used to VoIP communication.
Audio Codecs
Once the analog signal has been represented by a sequence of samples, these samples must be compressed to reduce the usage of network bandwidth required to transmit it. The compression and subsequent decompression at the destination is handled by codecs (COder - DECoder). The following codecs are the most commonly used with VoIP. Listed against it is its "Mean Opinion Score". This is the perceived quality of the audio signal after it has been compressed, transmitted and decompressed. The score ranges from 1 (bad) to 5 (excellent).
Codecs | MOS | Bit Rate (kb/s) | Sampling Rate (kHz) | Advantages | Disadvantages |
G.711 | 4.2 | 64 | 8 | no licensing fees, simple implementation, high quality | consumes more bandwidth |
G.729 | 4.0 | 8 | 8 | low bandwidth requirement, good quality | licensing fees |
G.723.1 | 3.7-3.9 | 5.6/6.3 | 8 | dual rate allows calls over 28.8 and 33 kbit/s modem links | licensing fees apply |
GSM 06.10 | 3.7 | 13 | 8 | free, used for GSM cellular telephony | lower perceived quality |
Real Time Protocol
After the data is converted into the correct format it is encapsulated in RTP Packets to be transmitted over the network. This transmission protocol needs to consider the following situations:
- The re-ordering of packets during the transmission
- The loss of packets
- The time between packet arrival (jitter)
For real time transmission, the re-ordering and jitter issues are more important than the loss of packets. It does not matter if a few packets are lost as long as most of them arrive continuously and in order. Therefore, the error checking and retransmission scheme of TCP is not really suitable for this purpose. UDP provides the high speeds required for real time transmission.
The fields of the RTP packet relevant to VoIP transmission include:
PT (Payload Type): This is a 7 bit field that allows values between 0 and 127. The interval between 96 and 127 is reserved for dynamic payload types. These are negotiated by the signaling protocol (SIPR or H.323) used to establish the VoIP call.
Sequence Number: This is a 16 bit value that helps order the packets after transmission. The sequence number starts at a random value and is incremented with each RTP packet sent
Timestamp: This is a 32 bit value that represents the clock tick count when the first audio sample in the payload was sampled. The timestamp also starts at a random value and incremented with each RTP packet sent.
Synchronization Source Identifier (SSRC): This is a 32 bit identifier that is chosen randomly and used to detect and resolve collisions. No two synchronization sources within the same RTP session will have the same SSRC.
Signaling Protocols
H.323
H.323 is an umbrella protocol was originally developed as a recommendation made by the ITU for videoconferencing over a packet based network. It was then adopted for VoIP. Its main objective is call control and management. For this it uses two signaling protocols, H.225 for call control and H.245 for call management. When a session is initiated between two H.232 devices, the H.225 protocol performs setup and tear down functions of the call using TCP. H.245 performs call management functions, including establishing device capabilities, negotiating codecs and port selection. Once this is completed, RTP is used to sequence and time media information transported over UDP. The primary benefit of H.323 is its address resolution capabilities. H.323 can be seen in many Cisco products and Microsoft's NetMeeting. Skype supports both H.323 and SIP.
SIP
Session Initiation Protocol is a text based signaling protocol used for setup and tear down of multimedia communication sessions such as voice and video calls over the internet. It is a TCP/IP based Application Layer protocol designed to be independent of the underlying transport layer. It is much lighter that H.323 which relies heavily on centralized gateways and servers. Being text based, it is easier to integrate with other Internet applications such as email and instant messaging. It also has a modular design that allows users of multimedia conferences in different locations to use whatever capabilities their device supports. SIP development had an advantage in that it was primarily governed by the IETF unlike H.323 which was governed by the ITU since the IETF is usually quicker to adapt to industry demands. It is being used by Nokia and Ericsson. Cisco has integrate d it into most of their products. Microsoft's Office Communications Server will be based on the SIP standard. SIP is now the accepted standard for VoIP communications because of its interoperability and integration capabilities.
VoIP Challenges
Quality of Service (QoS)
Because VoIP uses an internet connection, it becomes susceptible to all the issues associated with home broadband services. Problems such as latency, jitter and packet loss affect call quality.
Latency
Latency is the amount of time taken for a packet to travel from its source to its destination. It is also known as the delay of the network. It becomes difficult to carry on a conversation interrupted by large delays and interruptions. Users usually notice a round trip voice delay of 250 milliseconds or more. The ITU recommends a maximum one-way latency of 150 milliseconds. This is a very difficult problem to address. Some delays can be minimized by marking voice packets with ToS (Type of Service) values that indicate the requirement for low delay.
Jitter
When the packet is received, the IP packets may be out of order, delayed or missing. The have to be restructured while ensuring that the audio stream retains a good time consistency. Jitter is the measure of the variability over time of the latency across a network. Jitter is usually a problem in slow speed or congested connections. One way communication should have a jitter measure of less than 100ms. To address this issue, the packets must be stored in a jitter buffer once received prior to conversion to analog audio. When the buffer is full, the signal is converted. If packets are still missing when the buffer is full, it is not taken into account. The buffer size offers a trade-off between packet loss and delay. If the length is increased, quality issues due to packet loss is reduced but delay of is increased. If the length is decreased, quality issues due to packet loss is increased but delay of is decreased.
Packet Loss
VoIP is a mechanism for real time communication. Therefore it is based on UDP protocols. Because of this, if a packet is lost, it is not sent again. Also, if packets do not arrive on time they are discarded. This is not usually a problem since a few packets lost in isolation does not affect the audio stream. It does however become an issue when large amounts of packets in a sequence are lost. For voice quality to be maintained, the highest rate of packet loss must be 1%. A 1% rate of packet loss translates into one voice clip or skip every three minutes. This does depend on the codec used. When the code compression is higher, the effect of packet loss is more pronounced. G.729 and G.711 are more susceptible to quality reduction due to packet loss. Packet loss concealment is a technique that uses zero insertion, waveform substitution and model based methods to hide the effects of packet loss.
Emergency Calls
Since VoIP uses IP-addressed phone numbers, emergency services encounter the issue of associating IP addresses with a geographical location. Therefore, emergency calls cannot be routed to the nearest call center. VoIP Enhanced 911 is a method by which VoIP providers in the US support emergency services. It is based on a static table look up. The table associates a physical address with the calling party's telephone number. However, it is the responsibility of the user to keep the information in the table up to date and accurate. This disadvantage of VoIP technology has been blamed in tragic case of 18 month old Elijah Luck, where an ambulance was sent to the family's former home.
Power Failure
A standard telephone (cordless phones not included) runs on power provided over the line from the central office. Even if the power goes out, the phone works. VoIP connects to routers or modems that depend on the availability of mains electricity. Some VoIP service providers supply battery-backed power supplies to their customers to ensure uninterrupted service for several hours in case of power failure. This is not a major problem considering the fact that most people these days have modern cordless phones with phone book, caller id and voice mail features. These phones are susceptible to power failure as well.
Security
Eavesdropping
Eavesdropping occurs when unauthorized third parties monitor call signal packets. By doing this they may learn confidential information such as user names, password and phone numbers. This gives the third party enough information to gain control over the callers account. Confidential corporate information can also be maliciously obtained.
Service Theft
Identity or Service theft occurs when an unauthorized third party obtains a subscribers user name and password and then uses it steal VoIP services. The third party can mislead the subscribers contacts into believing that the third party is the account owner. This can be accomplished via Spoofing or Man in the Middle Attacks.
Denial of Service
Denial of Service attacks (DoS attacks) occurs when VoIP devices are overloaded with call requests and registrations. This creates resource exhaustion, disconnections and long term busy signals.