Recently I was requested to add IPv6 support to an existing IPv4 configuration interface written in C. The existing interface was written on top of a Linux OS, and it was used to bring up links, configure routes, attach alias addresses to interfaces, create VLAN interfaces and so on. Well, I rolled up my sleeves and begun the work. After a few hours of browsing the internet, I noticed that there really was not so much information about how to do all this. It seems to me that IPv6 is not yet so widely used. After a while I however was directed to netlink sockets.
First instructions about how to use netlink sockets, suggested using some wrapper library like libnl. However after bunch of unsuccessfull attempts I gave up with libnl (my mind was really not compatible with libnl documentation) and decided to use raw netlink sockets. Basic usage information for netlink sockets was easily avaliable. There is RFC discussing the netlink sockets, as well as man pages. However these do not seem to cover all pitfalls or usage information. In my journey to making this IPv6 interface to work I fell in more than one trap, and eventually ended up adding prints to kernel to see where the excution ended up, and where my requests were discarded...
But let's start the actual business.
Linux networking will be discussed in terms of links (interfaces), addresses and routes. For me these words mean something like:
link (interface), for example eth0 - a door to outer world. Or eth0.2, virtual interface (VLAN interface), nevertheless seen as a door to outer world by the system.
address: For example 192.168.1.77/32, or fe80::211:43ff:fe26:2b6c/64, a label telling what is behind the door - and address of one specific door.
route: for example 0.0.0.0 via dev eth0, or 10.34.143.0 255.255.255.0 gw 192.168.1.55 metric 2
which tells the system to kick a packet out through a specific door, if destination matches to certain something.
I guess I will shortly explain routes here. But before I do that, I'll mention that even though the netlink sockets are a way to configure and handle all these (well, if underlying drivers do support this - I guess they nowadays often do), they're also way more. Netlink is quite generic interface to exchange information between kernel and userspace. I know at least firewalls and neighbor cache can be managed via netlink interface on linux. However we concentrate on links, addresses and routes here.
Now, if we look at the previous route examples.
route 1 was format 0.0.0.0 via dev eth0
destination 0.0.0.0 (special, 'any' ip - means this is a default route, and all packacges for which we do not have better routing information will use this route.)
dev eth0 specifies interface in which this package should be directed.
route 2 was 10.34.143.0 255.255.255.0 gw 192.168.1.55 metric 2
destination 10.34.143.0
mask 255.255.255.0
This tells us that if package is destined to address 10.34.143.xx, then this route is selected. The mask information tells which bits of the destination address are meaningfull. Eg, all bits which are not zero in mask, will be meaningfull. destination/mask pair
192.168.254.0
255.255.254.0
would mean that packets targeted to 255.255.254.xx or 255.255.255.xx would go to this route. Another way to express this is using format < address > / < amout of meaningfull bits >
destination 10.34.143.0
mask 255.255.255.0
could be shown as
10.34.143.0/24 (mask 255.255.255.0 has 24 meaningfull bits since each 0-255 value occupies 8 bits. Thus 255.255.255 is 8+8+8 = 24)
gw means gateway. This is something like saying that the specified destinations are located behind a machine which address is xx.
Eg, saying that send these packets to gw machine, it knows where they should be redirected. Hence all packets going to this route will be first delivered to 192.168.1.55.
Metric 2... Metric is a way for us to define routes which are overlapping. Eg, if we have two routes, which' destinations would match, then we use
1. route with more accurate mask (Eg, if we specify route with mask 255.255.255.255 == host route, saying that destination address is exact, then this route will be preferred over routes with looser mask).
2. Route with smaller metric value.
With IPv6 the routing gets a bit more complicated, since some of the subnet information is built into addresses. I won't get into that now, hopefully you know what you're doing :)
So let's see. The netlink socket interface is typical socket interface. Eg, we send messages to socket, and receive replies from socket. We can also register for receiving reports from interesting events, and receive messages informing the changes. I however only show the typical send request = > receive responce sequences.
As mentioned, requests are divided to link, address and route requests. (families). And each of these have request types of
creating new < address/interface/route >
deleting old < address/interface/route >
getting information about existing < address/interface/route >
Exact defines (which also need to be filled in request) are:
RTM_NEWADDR
RTM_DELADDR,
RTM_GETADDR,
RTM_NEWROUTE
RTM_DELROUTE,
RTM_GETROUTE,
RTM_NEWLINK
RTM_DELLINK
RTM_GETLINK,
for links there also
RTM_SETLINK,
which need to be used to change existing link's attributes. NEWLINK requests with these attributes will be discarded (for me understanding this required adding prints to kernel...)
Actual request consist of:
1. message header type struct msghdr. I assume you're familiar with this standard message header.
2. netlink message header type struct nlmsghdr
3. request family specific header (struct ifaddrmsg / struct ifinfomsg / struct rtmsg)
4. set of family specific attributes
Netlink message header
struct nlmsghdr
{
__u32 nlmsg_len; /* Length of message including header */
__u16 nlmsg_type; /* Message content */
__u16 nlmsg_flags; /* Additional flags */
__u32 nlmsg_seq; /* Sequence number */
__u32 nlmsg_pid; /* Sending process port ID */
};
carries information necessary for
1. knowning the lenght of the message.
2. telling the type of the message (family)
3. flags telling how message should be handled - directly from /usr/include/linux/netlink.
/* Flags values */
NLM_F_REQUEST /* It is request message. */
NLM_F_MULTI /* Multipart message, terminated by NLMSG_DONE */
NLM_F_ACK /* Reply with ack, with zero or error code */
NLM_F_ECHO /* Echo this request */
/* Modifiers to GET request */
NLM_F_ROOT /* specify tree root */
NLM_F_MATCH /* return all matching */
NLM_F_ATOMIC /* atomic GET */
NLM_F_DUMP
/* Modifiers to NEW request */
NLM_F_REPLACE /* Override existing */
NLM_F_EXCL /* Do not touch, if it exists */
NLM_F_CREATE /* Create, if it does not exist */
NLM_F_APPEND /* Add to end of list */
for example every request should contain flag
NLM_F_REQUEST. Note that NLM_F_MATCH is propably not implemented, and you will get all routes/links/addresses that are specified - regardless the attributes / values in family specific struct. Flags are more accurately introduced for example in man pages.
3&4. identifying the request/responce pair. Typicaly you should set pid to something derived from process/thread id. I use
(pthread_self() < < 16|getpid());
Kernel's responces have pid set to 0.
You should propably use different sequence number for each request - that way you can associate correct responces with correct requests. I usually have global variable, and issue atomic incrementation operation to it when getting new sequence ID.
I also like to know if my request succeeded. Thus I always use NLM_F_ACK in other but GET requests. This will make kernel to send us reply with zero errorcode if requested operation succeeded. With GET requests I can expect getting reply anyways.
(The kernel's ACK message for successfull operation with NLM_F_ACK flag will be like:
struct nlmsghdr followed by struct nlmsgerr
where nlmsg_type == NLMSG_ERROR and error member of struct nlmsgerr is set to 0.
)
Family specific structs follow after the nlmsghdr. They're form:
Addresses:
struct ifaddrmsg
struct ifaddrmsg
{
__u8 ifa_family;
__u8 ifa_prefixlen; /* The prefix length */
__u8 ifa_flags; /* Flags */
__u8 ifa_scope; /* Address scope */
__u32 ifa_index; /* Link index */
};
Family being address family (typically AF_INET / AF_INET6)
prefixlen telling the mask bits. (see explanation of mask in routes above. This is used when we for example add new ip-address to existing interface - so called alias IP address. Then the mask tells what network lies behind the door for outgoing packets!)
flags
/* ifa_flags */
IFA_F_SECONDARY
IFA_F_TEMPORARY
IFA_F_NODAD
IFA_F_OPTIMISTIC
IFA_F_DADFAILED
IFA_F_HOMEADDRESS
IFA_F_DEPRECATED
IFA_F_TENTATIVE
IFA_F_PERMANENT
whose meanings are quite unknown for me - I haven't really needed any of these.
ifa_scope
I've successfully used 0 as scope for all requests - I have no idea what the scope really is with addresses.
ifa_index (index number of the interface this address is bound, see man pages for
unsigned if_nametoindex(const char *ifname);
links(interfaces)
struct ifinfomsg
struct ifinfomsg
{
unsigned char ifi_family;
unsigned char __ifi_pad;
unsigned short ifi_type; /* ARPHRD_* */
int ifi_index; /* Link index */
unsigned ifi_flags; /* IFF_* flags */
unsigned ifi_change; /* IFF_* change mask */
};
When I created new link (in order to do a VLAN interface - that was the only thing I used NEWLINK request for), I at first banged my head to wall by creating RTM_NEWLINK request, where ifinfomsg struct had fields filled... I always received error responces from kernel. When I finally compiled my own kernel, with lots of info prints included, I learned that the ifinfomsg struct should not contain much of values with NEWLINK request - or request would be discarded. Then I only filled ifi_change field with 0xffffffff and left rest of the fields to zero. Then I added attributes which specified the link to be VLAN interface. I'll show the attributes for requests later... This allowed me to create a new link.
After link was created, I used RTM_SETLINK request to set the link state to IFF_UP. For this step I filled ifi_index with index of newly created interface, and set flags to contain IFF_UP bit. ifi_change should still be 0xffffffff (at least according to man pages), since it is reserved for future use. I was lazy. I simply set the ifi_flags to be IFF_UP. I guess this approach may be hazardous (not sure though). It may be the interface has some other flags set up, and simply setting flags to be IFF_UP without knowing the original state may make us to lose some information. I do not know since I have not studied these flags so thoroughly. I just guess that better way could be first reading the flags, and then or (|) the IFF_UP with existing bits. However I guess making this so, that possible state changes between reading and setting the flags would be noticed - is hard. I just decided that there is no flags which I could accidentally zero... It is easier to not know ;)
routes
struct rtmsg
struct rtmsg
{
unsigned char rtm_family;
unsigned char rtm_dst_len;
unsigned char rtm_src_len;
unsigned char rtm_tos;
unsigned char rtm_table; /* Routing table id */
unsigned char rtm_protocol; /* Routing protocol; see below */
unsigned char rtm_scope; /* See below */
unsigned char rtm_type; /* See below */
unsigned rtm_flags;
};
where types can be
RTN_UNSPEC,
RTN_UNICAST, /* Gateway or direct route */
RTN_LOCAL, /* Accept locally */
RTN_BROADCAST, /* Accept locally as broadcast,
send as broadcast */
RTN_ANYCAST, /* Accept locally as broadcast,
but send as unicast */
RTN_MULTICAST, /* Multicast route */
RTN_BLACKHOLE, /* Drop */
RTN_UNREACHABLE, /* Destination is unreachable */
RTN_PROHIBIT, /* Administratively prohibited */
RTN_THROW, /* Not in this table */
RTN_NAT, /* Translate this address */
RTN_XRESOLVE, /* Use external resolver */
For normal routes you propably want to use RTN_UNICAST. Although knowing the amount of spam flowing through the ethernet wires... Well, I feel the RTN_BLACKHOLE is tempting ;)
defined protocols:
#define RTPROT_UNSPEC 0
#define RTPROT_REDIRECT 1 /* Route installed by ICMP redirects;
not used by current IPv4 */
#define RTPROT_KERNEL 2 /* Route installed by kernel */
#define RTPROT_BOOT 3 /* Route installed during boot */
#define RTPROT_STATIC 4 /* Route installed by administrator */
/* Values of protocol > = RTPROT_STATIC are not interpreted by kernel;
they are just passed from user and back as is.
It will be used by hypothetical multiple routing daemons.
Note that protocol values should be standardized in order to
avoid conflicts.
*/
#define RTPROT_GATED 8 /* Apparently, GateD */
#define RTPROT_RA 9 /* RDISC/ND router advertisements */
#define RTPROT_MRT 10 /* Merit MRT */
#define RTPROT_ZEBRA 11 /* Zebra */
#define RTPROT_BIRD 12 /* BIRD */
#define RTPROT_DNROUTED 13 /* DECnet routing daemon */
#define RTPROT_XORP 14 /* XORP */
#define RTPROT_NTK 15 /* Netsukuku */
#define RTPROT_DHCP 16 /* DHCP client */
I used RTPROT_STATIC, which corresponds the situation where user adds a route using "ip route add" command.
possible scopes:
RT_SCOPE_UNIVERSE=0,
/* User defined values */
RT_SCOPE_SITE=200,
RT_SCOPE_LINK=253,
RT_SCOPE_HOST=254,
RT_SCOPE_NOWHERE=255
Well, for route which is not meant to stay inside some known system, it is feasible to use RT_SCOPE_UNIVERSE.
/* rtm_flags */
#define RTM_F_NOTIFY 0x100 /* Notify user of route change */
#define RTM_F_CLONED 0x200 /* This route is cloned */
#define RTM_F_EQUALIZE 0x400 /* Multipath equalizer: NI */
#define RTM_F_PREFIX 0x800 /* Prefix addresses */
I used no flags, eg set the flags to zero.
So at this spot, our request consists of netlink header telling the type of our request (address, link or route) and the type specific structure behind this netlink header. Now, so that it wouldn't be so simple, we'll introduce some more dynamic data :]
attributes
Each request supports variable amount of attributes which will further describe the address/route/interface being created (changed/deleted). Attributes are data prepended with struct rtattr structure. It looks like:
struct rtattr
{
unsigned short rta_len;
unsigned short rta_type;
};
After this structure we have actual attribute data, which of course depends on rta_type, and has lenght
rta_len - sizeof(struct rtattr). Aligned to 4 bytes :]
Well, as a C coder can imagine, this level of dynamic structures can be quite challenging to handle. And as can be guessed, there is macros written to ease the attribute handling. (I also wrote simpe functions to add attributes to the request.)
Macros to handle all this are:
RTA_ALIGN(len)
RTA_OK(rta,len)
RTA_NEXT(rta,attrlen)
RTA_LENGTH(len)
RTA_DATA(rta)
RTA_PAYLOAD(rta)
RTA_ALIGN(len) Rounds the given lenght to next alignment boundary. Eg. with typical 4 byte alignment
RTA_ALIGN(1) would return 4 as would RTA_ALIGN(4). RTA_ALIGN(5) would return 8 and so on.
RTA_OK(rta,len) can be used to see if the attribute is Ok. It is quite common to use RTA_NEXT with RTA_OK to parse incoming messages. Eg, with RTA_NEXT we get the next attribute, and with RTA_OK we check attribute is ok to be inspected. len passed to these macros is originally the lenght of attribute buffer. Each call to RTA_NEXT shall update the lenght.
RTA_LENGHT(len) returns the lenght which is required to store attribute which data is len bytes. Eg, RTA_LENGHT adds the lenght of rtattr header to the lenght of the data, and adds required padding bytes to get the data correctly aligned.
RTA_DATA(rta) returns pointer to the beginning of the data in attribute rta.
RTA_PAYLOAD(rta) returns the lenght of the data payload in attribute.
Sounds confusing? Well, don't be worried. I'll later show you a few functions to handle the attributes...
Now we can get to the confusing point...
With VLAN interface creation I hit my head to wall. I did some googling. Then I googled more. Finally I had googled my *** off. "VLAN interface netlink sockets", "RTM_NEWLINK VLAN". "Create VLAN via netlink". "VLAN netlink attribute". It basically took all my googling skills, as well as jump into kernel sources to finally find it. What kind of magical attribute allows one to specify VLAN interface. Google gave me some hints that such an attribute exists, and finally I found it...
nested attributes. Nested attributes are attributes, which have attributes inside. That's it. One of the attributes to create a VLAN interface is one of these. But I'll show that later.
Attributes supported by different families are:
Addresses:
IFA_UNSPEC,
IFA_ADDRESS,
IFA_LOCAL,
IFA_LABEL,
IFA_BROADCAST,
IFA_ANYCAST,
IFA_CACHEINFO,
IFA_MULTICAST,
Routes:
RTA_UNSPEC,
RTA_DST,
RTA_SRC,
RTA_IIF,
RTA_OIF,
RTA_GATEWAY,
RTA_PRIORITY,
RTA_PREFSRC,
RTA_METRICS,
RTA_MULTIPATH,
RTA_PROTOINFO, /* no longer used */
RTA_FLOW,
RTA_CACHEINFO,
RTA_SESSION, /* no longer used */
RTA_MP_ALGO, /* no longer used */
RTA_TABLE,
Interfaces (links):
IFLA_UNSPEC,
IFLA_ADDRESS,
IFLA_BROADCAST,
IFLA_IFNAME,
IFLA_MTU,
IFLA_LINK,
IFLA_QDISC,
IFLA_STATS,
IFLA_COST,
IFLA_COST
IFLA_PRIORITY,
IFLA_PRIORITY
IFLA_MASTER,
IFLA_MASTER
IFLA_WIRELESS, /* Wireless Extension event - see wireless.h */
IFLA_WIRELESS
IFLA_PROTINFO, /* Protocol specific information for a link */
IFLA_PROTINFO
IFLA_TXQLEN,
IFLA_TXQLEN
IFLA_MAP,
IFLA_MAP
IFLA_WEIGHT,
IFLA_WEIGHT
IFLA_OPERSTATE,
IFLA_LINKMODE,
IFLA_LINKINFO,
IFLA_LINKINFO
IFLA_NET_NS_PID,
Some of these attributes are documented in rtnetlink man pages at man section 7 - some aren't...
Now just to ease the pain for those who struggle with the VLAN interface setup... I managed to do it with following attributes:
IFLA_LINK, data size of int, contains the real interface which this VLAN interface uses beneath.
IFLA_IFNAME, name of the new VLAN interface, lenght depends on lenght of the name you give. I used naming convention < original_interface > . < vlan Id >
nested attribute IFLA_LINKINFO, containing:
attribute IFLA_INFO_KIND, data is string 'vlan', lenght being the lenght of the string + padding.
another nested attribute IFLA_INFO_DATA, containing:
attribute IFLA_VLAN_ID, which data is the vlan id.
Uh oh... Sounds confusing, right? Nested attribute containing an attribute and another nested attribute which contains an attribute... Oh joy, occasionally I tend to believe that programmers have been REALLY drunk when getting their ideas... (http://xkcd.com/323/)
Anyways, I'll show you the code which I used to handle this:
#define NLMSG_BOTTOM(nlmsg) ((struct rtattr *)(((void *)(nlmsg)) + NLMSG_ALIGN((nlmsg)- > nlmsg_len)))
static int addAttr(struct nlmsghdr *nl_req, int attrlabel, const void *data, int datalen)
{
struct rtattr *attr=NLMSG_BOTTOM(nl_req));
unsigned int attrlen=RTA_LENGTH(datalen); /* sizeof(struct rtattr) + datalen + align */
if(NULL==nl_req || (datalen > 0 && NULL==data))
{
printf("NULL arg detected!");
return -1;
}
attr- > rta_type=attrlabel;
attr- > rta_len=attrlen;
memcpy(RTA_DATA(attr),data,datalen);
nl_req- > nlmsg_len=NLMSG_ALIGN(nl_req- > nlmsg_len)+RTA_ALIGN(attrlen);
return 0;
}
static struct rtattr * addNestedAttr(struct nlmsghdr *nl_req, int attrlabel)
{
struct rtattr *nested = NLMSG_BOTTOM(nl_req);
if(!addAttr(nl_req, attrlabel, NULL, 0))
return nested;
return NULL;
}
static void endNestedAttr(struct nlmsghdr *nl_req, struct rtattr *nested)
{
nested- > rta_len = (void *)NLMSG_BOTTOM(nl_req) - (void *)nested;
}
/* ...snip - Add attributes to the nlmsg msg */
struct rtattr *attr1, *attr2;
if(addAttr(msg,IFLA_LINK,&orig_ifindex,sizeof(int)))
{
printf("IFLA_LINK %d adding as rtattr to req failed!",orig_ifindex);
retval=-1;
}
else if(addAttr(msg,IFLA_IFNAME,ifname,strlen(ifname)))
{
printf("IFLA_IFNAME %s adding as rtattr to req failed!",ifname);
retval=-1;
}
else if(NULL==(attr1=addNestedAttr(msg,IFLA_LINKINFO)))
{
printf("addNestedAttr IFLA_LINKINFO FAILED!");
retval=-1;
}
else if(addAttr(msg,IFLA_INFO_KIND,"vlan", strlen("vlan")))
{
printf("IFLA_INFO_KIND \"vlan\" adding FAILED!");
retval=-1;
}
else if(NULL==(attr2=addNestedAttr(msg,IFLA_INFO_DATA)))
{
printf("addNestedAttr IFLA_INFO_DATA FAILED!");
retval=-1;
}
else if(addAttr(msg,IFLA_VLAN_ID,&vlanid,sizeof(unsigned short)))
{
printf("IFLA_VLAN_ID %hu adding as rtattr to req failed!",vlanid);
retval=-1;
}
else
{
endNestedAttr(msg,attr2);
endNestedAttr(msg,attr1);
printf("VLAN ID %hu, orig ifindex %d and new ifname %s added as attrs",vlanid,orig_ifindex,ifname);
}
This code assumes that the lenght of the nlmsg (in struct nlmsghdr) is summed up during message creation. Eg, that when the attributes are added, the lenght in nlmsghdr is updated to be the lenght of message constructed this far. Attribute addition relies upon this lenght, when adding new attributes && updates this lenght when attributes are added. (see the macro NLMSG_BOTTOM() )
Now I guess I am approaching the end of this short introduction. I'll however show you something from where you can get the idea of how messages are sent and received, and how attributes can be parsed. In order to sum up the full horror of this interface (dynamic = > flexible and generic = > terribly hard to use) I have to mention something about receiving the messages...
Messages will arrive from socket the socket. They will be placed in buffer you gave. You need to be prepared to handle:
1. reply where you have specified to short buffer = > you'll get reply with MSG_TRUNC bit set.
2. Reply where you have multiple nlmsgs in one received buffer. In that case, there's NLM_F_MULTI flag set. In that case last message shall have NLM_F_DONE set.
Eg. You may end up having a buffer, where you have variable amount of nlmsgs, wach containing different sized/type struct and variable amount of possibly nested attributes after that... Some (pseudo)code as an example...
struct sockaddr_nl kernproc;
struct msghdr msg;
struct nlmsghdr *netlinkresp;
struct iovec iov;
memset(&msg,0,sizeof(msg));
memset(&kernproc,0,sizeof(kernproc));
memset(&iov,0,sizeof(iov));
kernproc.nl_family = AF_NETLINK;
msg.msg_name=(void *)&kernproc;
msg.msg_namelen=sizeof(kernproc);
netlinkresp= < buffer allocated for responce > ;
/* Add NLMSG_F_ACK if no reply is to be expected othervice */
iov.iov_base=(void *)netlinkresp;
iov.iov_len= < nlmsg_len > ;
msg.msg_iov=&iov;
msg.msg_iovlen = 1; /* only one iov struct abowe */
iov.iov_len= < size of the resp buffer >
retry:
retval=recvmsg(sock, &msg, 0);
if(0 > =retval)
{
if(errno==EINTR)
goto retry;
else
{
printf("Error when receiving from netlink sock!");
/* handle error */
}
}
else
{
/* ...reply received */
}
So
1. Check lenght of received message from return value of recv. Never exeed it.
2. Check the msghdr (not nlmsghdr) to see the message was not truncated.
if(msg.msg_flags&MSG_TRUNC)
... allocate more space for resp and retry...
3. strore pointer to the nlmsghdr message header.
4. start a loop and use NLMSG_OK() to see message is ok. If msg is not OK, then you have nothing to handle.
5. Check the type of nlmsg, and lenght. If lenght is greater or equal to type specific header, then use
NLMSG_DATA to get the actual message. Cast and store a pointer to this.
5. Check the received message for information you longed for. If whole nlmsg lenght is still not handled, then there probably are attributes.
6. Obtain ptr to first attribute by adding size of message specific struct to the NLMSG_DATA().
7. start a loop and check the attribute with RTA_OK() If RTA_OK fails go to step 9
8. check attribute type, and data according to type & len.
obtain next attr with RTA_NEXT - > end loop and go back to step 7.
9. when last attribute is handled, check if NLMSG had NLM_F_MULTI set, and at least not NLMSG_DONE was specified, then get next NLMSG with NLMSG_NEXT() and loop again from step 4
Sending a message is done using same generic iovec mechanism. Eg:
struct sockaddr_nl kernproc;
struct msghdr msg;
struct nlmsghdr *netlinkreq;
struct iovec iov;
memset(&msg,0,sizeof(msg));this- > mypid
memset(&kernproc,0,sizeof(kernproc));
memset(&iov,0,sizeof(iov));
kernproc.nl_family = AF_NETLINK;
msg.msg_name=(void *)&kernproc;
msg.msg_namelen=sizeof(kernproc);
netlinkreq= < pointer to allocated and filled message request > ;
netlinkreq- > nlmsg_pid= (pthread_self() < < 16|getpid());
netlinkreq- > nlmsg_seq= atomicallyIncrementSeqId(seqid);
#ifdef debug
debugprint_msg(netlinkreq);
#endif
iov.iov_base=(void *)netlinkreq;
iov.iov_len=netlinkreq- > nlmsg_len;
msg.msg_iov=&iov;
msg.msg_iovlen = 1; /* only one iov struct abowe */
retval =sendmsg(sock,&msg,0);
if(retval < =0)
{
printf("sendmsg() FAILED!");
}
return retval;
the debugprint function I have used contains following code:
if(netlinkreq- > nlmsg_flags & NLM_F_ACK)
{
printf("Msg contains f_ack!");
}
if(!NLMSG_OK(netlinkreq,netlinkreq- > nlmsg_len))
{
printf("Looks like we're sending invalid nlmsg!! NLMSG_OK() == false at send!");
}
else
{
printf
(
"sending NLMSG: len %u, type %hu, flags %hu, seq %u pid %u",
netlinkreq- > nlmsg_len,
netlinkreq- > nlmsg_type,
netlinkreq- > nlmsg_flags,
netlinkreq- > nlmsg_seq,
netlinkreq- > nlmsg_pid
);
switch(netlinkreq- > nlmsg_type)
{
case RTM_NEWROUTE:
case RTM_DELROUTE:
case RTM_GETROUTE:
{
printf
(
"Req is route req (new %u, del %u, get %u)",
RTM_NEWROUTE,
RTM_DELROUTE,
RTM_GETROUTE
);
printf
(
"family %u, dstlen %u, srclen %u, tos %u, table %u, proto %u, scope %u, type %u, flags %u",
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_family,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_dst_len,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_src_len,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_tos,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_table,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_protocol,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_scope,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_type,
(unsigned int)((struct rtmsg *) NLMSG_DATA(netlinkreq) )- > rtm_flags
);
{
int len=netlinkreq- > nlmsg_len;
struct rtattr *at=(struct rtattr *)((char *)NLMSG_DATA(netlinkreq)+sizeof(struct rtmsg));
while(NULL!=at && RTA_OK(at,len))
{
char tmp[100];
switch(at- > rta_type)
{
case RTA_DST:
printf
(
"dst is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case RTA_SRC:
printf
(
"src is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case RTA_GATEWAY:
printf
(
"gw is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case RTA_OIF:
printf
(
"OIF is set to %u",
*(unsigned int *)RTA_DATA(at)
);
break;
case RTA_PRIORITY:
printf
(
"Priority is set to %u",
*(unsigned int *)RTA_DATA(at)
);
break;
default:
printf("rta_type %u, len %u",at- > rta_type,at- > rta_len);
break;
}
at=RTA_NEXT(at,len);
}
}
break;
}
case RTM_NEWADDR:
case RTM_GETADDR:
case RTM_DELADDR:
printf
(
"Req is ADDR req (new %u, del %u, get %u)",
RTM_NEWADDR,
RTM_DELADDR,
RTM_GETADDR
);
printf
(
"ifa_family %u, ifa_prefixlen %u, ifa_flags %u, ifa_scope %u, ifa_index %d",
(unsigned int)((struct ifaddrmsg *) NLMSG_DATA(netlinkreq) )- > ifa_family,
(unsigned int)((struct ifaddrmsg *) NLMSG_DATA(netlinkreq) )- > ifa_prefixlen,
(unsigned int)((struct ifaddrmsg *) NLMSG_DATA(netlinkreq) )- > ifa_flags,
(unsigned int)((struct ifaddrmsg *) NLMSG_DATA(netlinkreq) )- > ifa_scope,
(int)((struct ifaddrmsg *) NLMSG_DATA(netlinkreq) )- > ifa_index
);
{
int len=netlinkreq- > nlmsg_len;
struct rtattr *at=(struct rtattr *)((char *)NLMSG_DATA(netlinkreq)+sizeof(struct ifaddrmsg));
while(NULL!=at && RTA_OK(at,len))
{
char tmp[100];
switch(at- > rta_type)
{
case IFA_ADDRESS:
printf
(
"IFA_ADDRESS is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case IFA_LOCAL:
printf
(
"IFA_LOCAL is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case IFA_BROADCAST:
printf
(
"IFA_BROADCAST is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
case IFA_LABEL:
printf
(
"IFA_LABEL is set to '%s'",
(char *)RTA_DATA(at)
);
break;
case IFA_ANYCAST:
printf
(
"IFA_ANYCAST is set to %s",
inet_ntop
(
(at- > rta_len > 8)?AF_INET6:AF_INET,
RTA_DATA(at),
tmp,
100
)
);
break;
default:
printf("rta_type %u, len %u",at- > rta_type,at- > rta_len);
break;
}
at=RTA_NEXT(at,len);
}
}
break;
.
./* You could add other type of requests here too */
}
.
.
.
There you see how browsing through one message contents can be done. One loop more with some extra checks, and NLMSG_NEXT and NLMSG_OK macros would allow you to go through received nlmsgs.
Anyways, I guess I can end my brief introduction to netlink sockets here. Remeber to check the
man 7 rtnetlink and man 3 netlink for more information. Here is just some pieces like nested attributes and VLAN interface setup explained... Maybe you do not need to bang your head to wall just as much as I had to do. :]
Oh, and if you find this post or the code examples usefull, please do leave me a note :) And as allways, code presented here can be used/modified to suit your purposes, as long as you either drop me a note in Mazziesaccount@gmail.com or comment in here, and mention original author (me, Maz) && this site in your codes - especially if you publish them to be public somewhere.
Have fun!
Example codes | Related posts |
| All examples | No related posts |
| Explode function in C | ANSI C explode |
| Atomic Operations | (Finnish!) Atomiset Operaatiot (säikeet II) |
| Packed Array | C - optimize memory usage |
| Bitset in C | C - optimize memory usage |
| Trim/Rtrim (examples extended beyond post) | Trim/Rtrim |
| Linked list | No blog posts |
| Lottery machine | You can do OOP in C |
Wednesday, September 14, 2011
Netlink sockets
Tunnisteet:
address,
attribute,
C,
interface,
link,
nested attribute,
netlink,
route,
RTM_DELLINK,
RTM_GETLINK,
RTM_NEWLINK,
RTM_SETLINK,
vlan
| Actions: |
Wednesday, July 7, 2010
Download links in this blog.
Hi dee Ho again!
Times change. I iced my bot project (lack of time etc. etc...). The svn repository which is often referred here got shot down. The latest sources should still be available at
http://xp-dev.com/svn/MazBotV4/trunk/
Hi dee Ho peeps!
I have often given you a link to svn repository which is locadted on blackdiam.net/svn. Many of the sources for my C stuff are located there.
I did some refactoring in the repository, and most of the links got dated. If you encounter non working link to blackdiam.net svn repository in this blog, following corrective actions can be done:
Old link to dev repo base was:
http://blackdiam.net/svn/MazBot/
New link to dev repo base is
http://blackdiam.net/svn/MazBot/trunc/
And more stable environment is in
http://blackdiam.net/svn/MazBot/releases/0.2.1/
Hopefully you find what you're looking for :)
-Maz
Times change. I iced my bot project (lack of time etc. etc...). The svn repository which is often referred here got shot down. The latest sources should still be available at
http://xp-dev.com/svn/MazBotV4/trunk/
Hi dee Ho peeps!
I have often given you a link to svn repository which is locadted on blackdiam.net/svn. Many of the sources for my C stuff are located there.
I did some refactoring in the repository, and most of the links got dated. If you encounter non working link to blackdiam.net svn repository in this blog, following corrective actions can be done:
Old link to dev repo base was:
http://blackdiam.net/svn/MazBot/
New link to dev repo base is
http://blackdiam.net/svn/MazBot/trunc/
And more stable environment is in
http://blackdiam.net/svn/MazBot/releases/0.2.1/
Hopefully you find what you're looking for :)
-Maz
Tuesday, June 29, 2010
More tools to write better C
EDIT: 15.09.2011 Another SVN working now. Sources can be found from
http://xp-dev.com/svn/MazBotV4/trunk/generic/src/
files:
MbotBitset.c
MbotBitset.h
MbotPackedArray.c
MbotPackedArray.h
are covered here.
Oh my oh my. It has been ages since I last annoyed you my dear readers. Well, I figured I could now do that again and share some bits (pun intended) which may come handy.
Bitset:
With bitset I mean a simple way of decreasing memory usage while storing information. There's plenty of situations where we need to store some state for multiple things. A simple (and very much schooly) example could be student's course proceedings in a school. Eg, let's imagine a school where we have 10000 students. Each student needs to pass set of courses, and these courses are either passed or not passed. Now, we could assign an ID-number for each student, starting from 0, and ending up to 9999. (Gosh how fast these classes grow nowadays. What does this tell about us parents...).
Now first idea how to simply store this status is to create an array for each course, index it with student IDs, and set the element of array to 1 if student has passed the course, or 0 if he/she has not. Eg. we might end up in a solution like:
int biology1[10000];
int has_student_passed_biology1(int student_id)
{
if(student_id<0 || student_id>10000)
return -1;
else
return biology1[student_id];
}
Lets ponder a lil while. An integer on x86 machine takes up 32 bits of space. How many bits do we need? Naturally, we only need 1 bit, (student can have passed the course (1) or not passed (0). (Of course there is cases when we do not need even this many bits, for example the introduction to quantum mechanics was such a course for 1.st years physics students... No one passed, but let's ignore that now :D) So for each course we end up wasting 31 bits of memory/student. This is 310000 bits (we had 10000 students), Eg. roughly 37.8 kilobytes. Well, when we think of modern PCs, this might not be significant. But on the other hand, when we think of all the possible places where we can do this wasting... Well, have you ever wondered M$ product's memory usage? :D And if we think of embedded systems with limited amount of memory.........
To tackle this problem we could change ints to chars. That would divide the loss by 4. But it would still be a waste. Especially in already bloated world of SW.
So my (and many other's) solution is a bitset. In a simplest form it is just a way to read/write state of single bit. And basically this is what my bitset does. Eg, at the beginning, you initialize my bitset by telling how many bits you need, and later you set/unset/get the state of N.th bit of bitset.
My bitset implementation can be obtained from my bot project's repository:
http://xp-dev.com/svn/MazBotV4/trunk/generic/src/
The files are MbotBitset.h and MbotBitset.c
NOTE! I've not tested this bit set on 64 bit architectures, and it may fail there. (It may also work though). If you test it, please report the results in this blog for example (as a comment), or by email (Mazziesaccount@gmail.com).
Packed Array:
Well, if we continue playing with thoughts of students, we can go back in good ol' times where grades from courses at university had only 4 possibilities. Failed, 1,2 or 3. Now if we think of storing these numbers, we note following:
1. bitset cannot really be easily used, since it only allows 2 states / Id. (since it only uses 1 bit to represent the value, but representing 4 states requires 2 bits, 00, 01, 10, 11)
2. If we use char array, only 2 of 8 bits is used / student.
To tackle this one I wrote a packed array.
Basically my packed array just calculates amount of bits needed to represent the largest value (told at array initialization), and then allocates the amount of memory that is needed for array with X members (also told at initialization). Then it splits the memory to slots and allows writing/reading values stored in N bits wide slots.
My packed array implementation can be found from same place as the bitset, files MbotPackedArray.h and MbotPackedArray.c
NOTE! Packed array has not been tested on 64-bit machine (and I assume it will not work there). Also 32 bit environment has not been so throughoutly tested, but at least I've not noticed that there is bugs left...
Anyways, I'd be gratefull if you reported results of all experiments with the packed array / bitset.
(To my mail (Mazziesaccount@gmail.com) Bugs, success stories, improvements...
or in this blog as a comment.Bugs, success stories, improvements...)
http://xp-dev.com/svn/MazBotV4/trunk/generic/src/
files:
MbotBitset.c
MbotBitset.h
MbotPackedArray.c
MbotPackedArray.h
are covered here.
Oh my oh my. It has been ages since I last annoyed you my dear readers. Well, I figured I could now do that again and share some bits (pun intended) which may come handy.
Bitset:
With bitset I mean a simple way of decreasing memory usage while storing information. There's plenty of situations where we need to store some state for multiple things. A simple (and very much schooly) example could be student's course proceedings in a school. Eg, let's imagine a school where we have 10000 students. Each student needs to pass set of courses, and these courses are either passed or not passed. Now, we could assign an ID-number for each student, starting from 0, and ending up to 9999. (Gosh how fast these classes grow nowadays. What does this tell about us parents...).
Now first idea how to simply store this status is to create an array for each course, index it with student IDs, and set the element of array to 1 if student has passed the course, or 0 if he/she has not. Eg. we might end up in a solution like:
int biology1[10000];
int has_student_passed_biology1(int student_id)
{
if(student_id<0 || student_id>10000)
return -1;
else
return biology1[student_id];
}
Lets ponder a lil while. An integer on x86 machine takes up 32 bits of space. How many bits do we need? Naturally, we only need 1 bit, (student can have passed the course (1) or not passed (0). (Of course there is cases when we do not need even this many bits, for example the introduction to quantum mechanics was such a course for 1.st years physics students... No one passed, but let's ignore that now :D) So for each course we end up wasting 31 bits of memory/student. This is 310000 bits (we had 10000 students), Eg. roughly 37.8 kilobytes. Well, when we think of modern PCs, this might not be significant. But on the other hand, when we think of all the possible places where we can do this wasting... Well, have you ever wondered M$ product's memory usage? :D And if we think of embedded systems with limited amount of memory.........
To tackle this problem we could change ints to chars. That would divide the loss by 4. But it would still be a waste. Especially in already bloated world of SW.
So my (and many other's) solution is a bitset. In a simplest form it is just a way to read/write state of single bit. And basically this is what my bitset does. Eg, at the beginning, you initialize my bitset by telling how many bits you need, and later you set/unset/get the state of N.th bit of bitset.
My bitset implementation can be obtained from my bot project's repository:
http://xp-dev.com/svn/MazBotV4/trunk/generic/src/
The files are MbotBitset.h and MbotBitset.c
NOTE! I've not tested this bit set on 64 bit architectures, and it may fail there. (It may also work though). If you test it, please report the results in this blog for example (as a comment), or by email (Mazziesaccount@gmail.com).
Packed Array:
Well, if we continue playing with thoughts of students, we can go back in good ol' times where grades from courses at university had only 4 possibilities. Failed, 1,2 or 3. Now if we think of storing these numbers, we note following:
1. bitset cannot really be easily used, since it only allows 2 states / Id. (since it only uses 1 bit to represent the value, but representing 4 states requires 2 bits, 00, 01, 10, 11)
2. If we use char array, only 2 of 8 bits is used / student.
To tackle this one I wrote a packed array.
Basically my packed array just calculates amount of bits needed to represent the largest value (told at array initialization), and then allocates the amount of memory that is needed for array with X members (also told at initialization). Then it splits the memory to slots and allows writing/reading values stored in N bits wide slots.
My packed array implementation can be found from same place as the bitset, files MbotPackedArray.h and MbotPackedArray.c
NOTE! Packed array has not been tested on 64-bit machine (and I assume it will not work there). Also 32 bit environment has not been so throughoutly tested, but at least I've not noticed that there is bugs left...
Anyways, I'd be gratefull if you reported results of all experiments with the packed array / bitset.
(To my mail (Mazziesaccount@gmail.com) Bugs, success stories, improvements...
or in this blog as a comment.Bugs, success stories, improvements...)
| Actions: |
Wednesday, July 22, 2009
New C project started...
I abandoned the foreverlasting MazPong project (for a while?). Reason you may ask. Because it grew into mess. Try writing project which gets bloated since you just write things you notice are handy, and forget to make small, compact TESTED things - and document them as you go on. Well, it may work if you can work continuously, so that you can remember the big picture throughout the project. That's not the case with MazPong. I started writing it at 2005 (or so), and it has been completely redesigned three times while going on... And I have never worked 5 days in a row, more like two days of work - two - six months break and so on... Yet I never started from scratch...
So it became a jungle of wery different pieces of code, some of which are dublicated (because I forgot I did something last year)...
So it is on hold. I do not say it is totally forgotten - it is foreverlasting project for god's sake!
Now.. I started another project, where I have a small motivator - a friend of mine might have use for it :D At this time it's not a game, it will be yet another IRC bot called MazBotV4 (working as an IRC client). I have always had obsession with IRC bots. First I wrote with mirc scripts, and it terrorized #oldgames channel related to old computer games. Then I wrote a bot using Eggdrop and TCL - with little success. TCL was not for me. V3 was written in C++, but I grew tired with it since I had no use for it. Now V4 will be pure plain C.
This bot is meant to give easy interface for doing some tasks based on text file configurations - while it also offers almost unlimited possibilities to someone familiar with C - I plan writing a nice callback interface, where people with limited skills can add their own functions to be executed when an event occurs on IRC channel. And finally, I plan to write an interface, where bot user's can specify network addresses where bot connects on certain events. Idea is that this way one can send/retrieve information from web servers using the bot.
- no. I am not going to write HTML parser in bot. period. It will be user's responcebility to write some scripts to web server - which can provide the data in simple format to bot. But of course, if someone wants to add html parser in the bot... ;) And other help will propably be appreciated (at least at some point - if I manage to keep this going. Oh yes, if anyone familiar with GNU Makefiles wants to improve the make scripts I have written [No autotools - I dislike them!!] and still keep them simple - welcome to take a peek and send suggestions!)
Oh, the title of this post should link to source repository.
And in case you want to see stable version, here:
http://xp-dev.com/svn/Mazzie-mazbot/
Oh, another bug found from Cexplode- put up a "correction release" v0.1-1
And finally, I have planned to use my Cexplode function set in IRC parser - and I have added some mysterious functions into it... So there's enchanced (and at the moment quite untested) version available. I just changed the file name to helpers.h and helpers.c (in generic/src folder). Also test/usage example is present in test/src folder. (This info is provided since according to my tracker, ~60% of people visiting this blog are looking for explode function for C)
License terms are same as before - use/modify as pleases you - but let me know it by email or via a comment in this blog. But if you redistribute it, mention the original author (me) and this website (or the finnish equivalent: http://c-ohjelmoijanajatuksia.blogspot.com/ )
All in all, happy summer, remember your caps - don't fry your brains :)
-Matti
So it became a jungle of wery different pieces of code, some of which are dublicated (because I forgot I did something last year)...
So it is on hold. I do not say it is totally forgotten - it is foreverlasting project for god's sake!
Now.. I started another project, where I have a small motivator - a friend of mine might have use for it :D At this time it's not a game, it will be yet another IRC bot called MazBotV4 (working as an IRC client). I have always had obsession with IRC bots. First I wrote with mirc scripts, and it terrorized #oldgames channel related to old computer games. Then I wrote a bot using Eggdrop and TCL - with little success. TCL was not for me. V3 was written in C++, but I grew tired with it since I had no use for it. Now V4 will be pure plain C.
This bot is meant to give easy interface for doing some tasks based on text file configurations - while it also offers almost unlimited possibilities to someone familiar with C - I plan writing a nice callback interface, where people with limited skills can add their own functions to be executed when an event occurs on IRC channel. And finally, I plan to write an interface, where bot user's can specify network addresses where bot connects on certain events. Idea is that this way one can send/retrieve information from web servers using the bot.
- no. I am not going to write HTML parser in bot. period. It will be user's responcebility to write some scripts to web server - which can provide the data in simple format to bot. But of course, if someone wants to add html parser in the bot... ;) And other help will propably be appreciated (at least at some point - if I manage to keep this going. Oh yes, if anyone familiar with GNU Makefiles wants to improve the make scripts I have written [No autotools - I dislike them!!] and still keep them simple - welcome to take a peek and send suggestions!)
Oh, the title of this post should link to source repository.
And in case you want to see stable version, here:
http://xp-dev.com/svn/Mazzie-mazbot/
Oh, another bug found from Cexplode- put up a "correction release" v0.1-1
And finally, I have planned to use my Cexplode function set in IRC parser - and I have added some mysterious functions into it... So there's enchanced (and at the moment quite untested) version available. I just changed the file name to helpers.h and helpers.c (in generic/src folder). Also test/usage example is present in test/src folder. (This info is provided since according to my tracker, ~60% of people visiting this blog are looking for explode function for C)
License terms are same as before - use/modify as pleases you - but let me know it by email or via a comment in this blog. But if you redistribute it, mention the original author (me) and this website (or the finnish equivalent: http://c-ohjelmoijanajatuksia.blogspot.com/ )
All in all, happy summer, remember your caps - don't fry your brains :)
-Matti
| Actions: |
Wednesday, May 6, 2009
Example of object oriented C
I have previously written an entry stating that c can be used as object oriented language. Now I decided to write an example demostrating this. So here it is, a program drawing lottery numbers, written in ansi c, using object oriented approach.
I know the code looks horrible. It is. Object oriented code often is. And this example could have been written so much easier with plain procedural c. But this shows how basic requirements of object orientation are fullfilled. Lottery machine is (virtual) base class, and basic & viking lottery are derived classes. If i had more enthusiasm, i would have written joker class too.--
And as an explanation, in finland we have two different types of lottery, viking lotto and regular lotto. Theres different amount of numbers drawn, but i cant remember the accurate amounts. Theres also so called joker, but i cant remember the amount of numbers in it either.
And as a final note, i know rand() is not perfect function to use, but since this was not meant to be real lottery machine, i could not care less :)
So here's the lottery source. Compile with
gcc -Wall -o lotteryExe lotto.c lottoTest.c regularlotto.c vikinglotto.c
Tunnisteet:
code,
coding,
example,
object oriented C
| Actions: |
Thursday, April 2, 2009
What I learned today.
I've noticed that SW designer's job is constant learning. If you stop learning, tou stop advancing. For a newish guy on the field like me, most of the learning is related to existing systems - but even the old chaps need to keep making progress, new technologies keep coming, and falling back means giving advantage to others.
So, I guess I'll start writing down this "What I learned today" post, in which I try to add new comments ehen I have learned something new. Just short sentences, with no long background, but with some information that was new to me.
But since I started this now, not at my first day in the job, I'll write down a couple of things in this post too :)
Last December.
Untill the really latest glibc versions, there's a bug in glibc's timer_create() function implementation, which may cause a crash under certain conditions. (When timer expires, attempt to read the internal list where timers are kept in glibc, may access to invalid memory location)
A few months ago.
Helgrind is yet another great tool built on valgrind.
(It can be used to detect synchronization issues in multithreaded applications - although it has it's limitations)
A few weeks ago.
In some glibc versions for powerpc have a bug, which makes valgrind yelling a lot of false alarms!
A few days ago.
pthread's stack size cannot be easily (or portably) changed after the thread is created.
(It can be set during the thread initialization. It is not really common to have real need for changing the default stack size [actually this sounds like terribly failed design], but there may be cases where one needs to create many [read some hundreds of] threads with limited virtual address space [*shrugs*]. On linux at least, the real memory is not used more than really required, but virtual address space will be reserved for whole threads stack (no matter how much the thread really needs the stack).
Today: (02.04.2009)
In most linuxes, there's a program called "yes" included, which will repeatedly print the text given as argument. My co-worker told that to me today, and I used it to generate some CPU load when I needed to do some performance analysis. I used:
yes Matti > /dev/null
renice -19
So, I guess I'll start writing down this "What I learned today" post, in which I try to add new comments ehen I have learned something new. Just short sentences, with no long background, but with some information that was new to me.
But since I started this now, not at my first day in the job, I'll write down a couple of things in this post too :)
Last December.
Untill the really latest glibc versions, there's a bug in glibc's timer_create() function implementation, which may cause a crash under certain conditions. (When timer expires, attempt to read the internal list where timers are kept in glibc, may access to invalid memory location)
A few months ago.
Helgrind is yet another great tool built on valgrind.
(It can be used to detect synchronization issues in multithreaded applications - although it has it's limitations)
A few weeks ago.
In some glibc versions for powerpc have a bug, which makes valgrind yelling a lot of false alarms!
A few days ago.
pthread's stack size cannot be easily (or portably) changed after the thread is created.
(It can be set during the thread initialization. It is not really common to have real need for changing the default stack size [actually this sounds like terribly failed design], but there may be cases where one needs to create many [read some hundreds of] threads with limited virtual address space [*shrugs*]. On linux at least, the real memory is not used more than really required, but virtual address space will be reserved for whole threads stack (no matter how much the thread really needs the stack).
Today: (02.04.2009)
In most linuxes, there's a program called "yes" included, which will repeatedly print the text given as argument. My co-worker told that to me today, and I used it to generate some CPU load when I needed to do some performance analysis. I used:
yes Matti > /dev/null
renice -19
Wednesday, March 25, 2009
Linux virtuall address space.
Today I spotted a question about ability to read process data after process freeds the memory it has used. (It was on a one of the best C programming related forums I know, at CBoard )
So I started writing an answer, which accidentaly became lenghty. After I first read it I felt that there's no hope it could clarify a thing to anyone. But after a few beers, I started to feel pretty damn proud about that babbling. So I decided to copy it to here too. So enjoy.
About the reading memory after freeing it etc. I'll try to shed some light on this topic, and I hope if the more experienced guys spot a problem in my explanation, they will further teach both me and you

I will be talking in linux point of view, but I assume that most modern systems do it in somewhat similar manner. Of course there's some realtime systems (like OSE), in which all memory is available to all brogram blocks (Eg. processes), but I assume you're talking about some desktop system.
So when we simplify enough we can say following. . Most modern OSes do map the real physical memory into virtual memory addresses, and give to each process their own virtual address space. Actually, in most modern computer systems, we have a hardware memory management unit which handles conversions from virtual addresses to physical memory and vice versa. So basically, at user space (which is basically all applications that aren't either part of the kernel, or a kernel module) it is not possible to access straight to hardware (memory or other hardware). Hardware is invisible for userspace processes, it can only be accessed via drivers written in kernel space. (drivers offer interface for user space applications). [[There is a workaround, but let's not mess with it now]].
So process cannot directly access to RAM.
When we launch a process, certain virtual address space is given to it (and certain physical ran corresponding to it, but the process has no means to get direct HW addresses). The process has no access outside of this sandbox. When we launch another process, it will have it's own virtual address space, and it's own corresponding physical ram. If process is running out of ram, kernel has means to increase it.
If another process tries to access outside of it's sandbox, Eg. requests reading from / writing to location which is not in it's address space, MMU will wake up the kernel, which will check out what is happening. If request was illegal, kernel will generate signal SIGSEGV, and send it to your program. => segmentation fault.
However physical ram is not cleaned, and the amount assigned to each process may change. There's also possibility of swapping, where some memory pages are written temporarily to hard disk, to free up some ram for other purposes.
So basically, it indeed is possible that there is leftowers of data written by your program in the memory. But there should be no traces where in the physical ram your data was, so knowing which bytes belonged to what program - not to mention what were these bytes used for - is quite close to impossible. I say quite close because I've never considered that, and I do not have any accurate information how possible it could be.
If I go a little further off topic to see if I have understood it correctly (I believe there is far more experienced fellows amongst us - who will correct me if I go wrong), there is at least following benefits of having physical addresses hidden:
1. Security. No user space process has access to go and mess the HW as they want - they need to use drivers, which should be written in such a manner that hugest hazards are not possible...
2. Per process memory mapping. If memory would be accessible between processes... Oh joy. If you have ever attempted to build huge and complicated multithreaded software you know what I mean. It really is a mess. Use of one uninitialized pointer may crash anything anywhere, or just make generally bizarre things to happen.
I guess steps 1 and 2 could be implemented even if the applications could access ram directly - but I am not sure how easy / difficult it would be.
3. Shared libraries. Shared libraries are loaded to memory during runtime. When each process has their own virtual address space, shared libraries can be downloaded to physical memory only once. The same physical memory block containing the library can then be mapped for each process' address space.
Finally, the inter process communication often uses "creature" called shared memory. processes can request a shared memory block from kernel, which will then map some physical memory chunk to more than one process' virtual address space. Note however that virtual addresses in different processes need not to be the same, even though they point at same physical ram. Hence shared memory is usually accessed according to the offsets from the start of the memory pool.
However, note that when you quit your program, the shared memory is not automatically freed. (thanks to brewbuck for confirmation
). It will stay accessible (if it was created to be accessible).And finally the [[]] which I wrote at the beginning. It is possible to write a driver, which can map some real hardware addresses to shared memory, which can then furthermore be mapped by a user space process. That way the HW can be accessed directly by a user space application.
Tunnisteet:
Linux,
shared memory,
virtual memory
| Actions: |
Subscribe to:
Posts (Atom)