Skip to content

Commit 9651851

Browse files
kaberummakynes
authored andcommitted
netfilter: add nftables
This patch adds nftables which is the intended successor of iptables. This packet filtering framework reuses the existing netfilter hooks, the connection tracking system, the NAT subsystem, the transparent proxying engine, the logging infrastructure and the userspace packet queueing facilities. In a nutshell, nftables provides a pseudo-state machine with 4 general purpose registers of 128 bits and 1 specific purpose register to store verdicts. This pseudo-machine comes with an extensible instruction set, a.k.a. "expressions" in the nftables jargon. The expressions included in this patch provide the basic functionality, they are: * bitwise: to perform bitwise operations. * byteorder: to change from host/network endianess. * cmp: to compare data with the content of the registers. * counter: to enable counters on rules. * ct: to store conntrack keys into register. * exthdr: to match IPv6 extension headers. * immediate: to load data into registers. * limit: to limit matching based on packet rate. * log: to log packets. * meta: to match metainformation that usually comes with the skbuff. * nat: to perform Network Address Translation. * payload: to fetch data from the packet payload and store it into registers. * reject (IPv4 only): to explicitly close connection, eg. TCP RST. Using this instruction-set, the userspace utility 'nft' can transform the rules expressed in human-readable text representation (using a new syntax, inspired by tcpdump) to nftables bytecode. nftables also inherits the table, chain and rule objects from iptables, but in a more configurable way, and it also includes the original datatype-agnostic set infrastructure with mapping support. This set infrastructure is enhanced in the follow up patch (netfilter: nf_tables: add netlink set API). This patch includes the following components: * the netlink API: net/netfilter/nf_tables_api.c and include/uapi/netfilter/nf_tables.h * the packet filter core: net/netfilter/nf_tables_core.c * the expressions (described above): net/netfilter/nft_*.c * the filter tables: arp, IPv4, IPv6 and bridge: net/ipv4/netfilter/nf_tables_ipv4.c net/ipv6/netfilter/nf_tables_ipv6.c net/ipv4/netfilter/nf_tables_arp.c net/bridge/netfilter/nf_tables_bridge.c * the NAT table (IPv4 only): net/ipv4/netfilter/nf_table_nat_ipv4.c * the route table (similar to mangle): net/ipv4/netfilter/nf_table_route_ipv4.c net/ipv6/netfilter/nf_table_route_ipv6.c * internal definitions under: include/net/netfilter/nf_tables.h include/net/netfilter/nf_tables_core.h * It also includes an skeleton expression: net/netfilter/nft_expr_template.c and the preliminary implementation of the meta target net/netfilter/nft_meta_target.c It also includes a change in struct nf_hook_ops to add a new pointer to store private data to the hook, that is used to store the rule list per chain. This patch is based on the patch from Patrick McHardy, plus merged accumulated cleanups, fixes and small enhancements to the nftables code that has been done since 2009, which are: From Patrick McHardy: * nf_tables: adjust netlink handler function signatures * nf_tables: only retry table lookup after successful table module load * nf_tables: fix event notification echo and avoid unnecessary messages * nft_ct: add l3proto support * nf_tables: pass expression context to nft_validate_data_load() * nf_tables: remove redundant definition * nft_ct: fix maxattr initialization * nf_tables: fix invalid event type in nf_tables_getrule() * nf_tables: simplify nft_data_init() usage * nf_tables: build in more core modules * nf_tables: fix double lookup expression unregistation * nf_tables: move expression initialization to nf_tables_core.c * nf_tables: build in payload module * nf_tables: use NFPROTO constants * nf_tables: rename pid variables to portid * nf_tables: save 48 bits per rule * nf_tables: introduce chain rename * nf_tables: check for duplicate names on chain rename * nf_tables: remove ability to specify handles for new rules * nf_tables: return error for rule change request * nf_tables: return error for NLM_F_REPLACE without rule handle * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification * nf_tables: fix NLM_F_MULTI usage in netlink notifications * nf_tables: include NLM_F_APPEND in rule dumps From Pablo Neira Ayuso: * nf_tables: fix stack overflow in nf_tables_newrule * nf_tables: nft_ct: fix compilation warning * nf_tables: nft_ct: fix crash with invalid packets * nft_log: group and qthreshold are 2^16 * nf_tables: nft_meta: fix socket uid,gid handling * nft_counter: allow to restore counters * nf_tables: fix module autoload * nf_tables: allow to remove all rules placed in one chain * nf_tables: use 64-bits rule handle instead of 16-bits * nf_tables: fix chain after rule deletion * nf_tables: improve deletion performance * nf_tables: add missing code in route chain type * nf_tables: rise maximum number of expressions from 12 to 128 * nf_tables: don't delete table if in use * nf_tables: fix basechain release From Tomasz Bursztyka: * nf_tables: Add support for changing users chain's name * nf_tables: Change chain's name to be fixed sized * nf_tables: Add support for replacing a rule by another one * nf_tables: Update uapi nftables netlink header documentation From Florian Westphal: * nft_log: group is u16, snaplen u32 From Phil Oester: * nf_tables: operational limit match Signed-off-by: Patrick McHardy <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
1 parent f59cb04 commit 9651851

39 files changed

+6393
-6
lines changed

include/linux/netfilter.h

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,12 +53,13 @@ struct nf_hook_ops {
5353
struct list_head list;
5454

5555
/* User fills in from here down. */
56-
nf_hookfn *hook;
57-
struct module *owner;
58-
u_int8_t pf;
59-
unsigned int hooknum;
56+
nf_hookfn *hook;
57+
struct module *owner;
58+
void *priv;
59+
u_int8_t pf;
60+
unsigned int hooknum;
6061
/* Hooks are ordered in ascending priority. */
61-
int priority;
62+
int priority;
6263
};
6364

6465
struct nf_sockopt_ops {

include/net/netfilter/nf_tables.h

Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
#ifndef _NET_NF_TABLES_H
2+
#define _NET_NF_TABLES_H
3+
4+
#include <linux/list.h>
5+
#include <linux/netfilter.h>
6+
#include <linux/netfilter/nf_tables.h>
7+
#include <net/netlink.h>
8+
9+
struct nft_pktinfo {
10+
struct sk_buff *skb;
11+
const struct net_device *in;
12+
const struct net_device *out;
13+
u8 hooknum;
14+
u8 nhoff;
15+
u8 thoff;
16+
};
17+
18+
struct nft_data {
19+
union {
20+
u32 data[4];
21+
struct {
22+
u32 verdict;
23+
struct nft_chain *chain;
24+
};
25+
};
26+
} __attribute__((aligned(__alignof__(u64))));
27+
28+
static inline int nft_data_cmp(const struct nft_data *d1,
29+
const struct nft_data *d2,
30+
unsigned int len)
31+
{
32+
return memcmp(d1->data, d2->data, len);
33+
}
34+
35+
static inline void nft_data_copy(struct nft_data *dst,
36+
const struct nft_data *src)
37+
{
38+
BUILD_BUG_ON(__alignof__(*dst) != __alignof__(u64));
39+
*(u64 *)&dst->data[0] = *(u64 *)&src->data[0];
40+
*(u64 *)&dst->data[2] = *(u64 *)&src->data[2];
41+
}
42+
43+
static inline void nft_data_debug(const struct nft_data *data)
44+
{
45+
pr_debug("data[0]=%x data[1]=%x data[2]=%x data[3]=%x\n",
46+
data->data[0], data->data[1],
47+
data->data[2], data->data[3]);
48+
}
49+
50+
/**
51+
* struct nft_ctx - nf_tables rule context
52+
*
53+
* @afi: address family info
54+
* @table: the table the chain is contained in
55+
* @chain: the chain the rule is contained in
56+
*/
57+
struct nft_ctx {
58+
const struct nft_af_info *afi;
59+
const struct nft_table *table;
60+
const struct nft_chain *chain;
61+
};
62+
63+
enum nft_data_types {
64+
NFT_DATA_VALUE,
65+
NFT_DATA_VERDICT,
66+
};
67+
68+
struct nft_data_desc {
69+
enum nft_data_types type;
70+
unsigned int len;
71+
};
72+
73+
extern int nft_data_init(const struct nft_ctx *ctx, struct nft_data *data,
74+
struct nft_data_desc *desc, const struct nlattr *nla);
75+
extern void nft_data_uninit(const struct nft_data *data,
76+
enum nft_data_types type);
77+
extern int nft_data_dump(struct sk_buff *skb, int attr,
78+
const struct nft_data *data,
79+
enum nft_data_types type, unsigned int len);
80+
81+
static inline enum nft_data_types nft_dreg_to_type(enum nft_registers reg)
82+
{
83+
return reg == NFT_REG_VERDICT ? NFT_DATA_VERDICT : NFT_DATA_VALUE;
84+
}
85+
86+
extern int nft_validate_input_register(enum nft_registers reg);
87+
extern int nft_validate_output_register(enum nft_registers reg);
88+
extern int nft_validate_data_load(const struct nft_ctx *ctx,
89+
enum nft_registers reg,
90+
const struct nft_data *data,
91+
enum nft_data_types type);
92+
93+
/**
94+
* struct nft_expr_ops - nf_tables expression operations
95+
*
96+
* @eval: Expression evaluation function
97+
* @init: initialization function
98+
* @destroy: destruction function
99+
* @dump: function to dump parameters
100+
* @list: used internally
101+
* @name: Identifier
102+
* @owner: module reference
103+
* @policy: netlink attribute policy
104+
* @maxattr: highest netlink attribute number
105+
* @size: full expression size, including private data size
106+
*/
107+
struct nft_expr;
108+
struct nft_expr_ops {
109+
void (*eval)(const struct nft_expr *expr,
110+
struct nft_data data[NFT_REG_MAX + 1],
111+
const struct nft_pktinfo *pkt);
112+
int (*init)(const struct nft_ctx *ctx,
113+
const struct nft_expr *expr,
114+
const struct nlattr * const tb[]);
115+
void (*destroy)(const struct nft_expr *expr);
116+
int (*dump)(struct sk_buff *skb,
117+
const struct nft_expr *expr);
118+
119+
struct list_head list;
120+
const char *name;
121+
struct module *owner;
122+
const struct nla_policy *policy;
123+
unsigned int maxattr;
124+
unsigned int size;
125+
};
126+
127+
#define NFT_EXPR_SIZE(size) (sizeof(struct nft_expr) + \
128+
ALIGN(size, __alignof__(struct nft_expr)))
129+
130+
/**
131+
* struct nft_expr - nf_tables expression
132+
*
133+
* @ops: expression ops
134+
* @data: expression private data
135+
*/
136+
struct nft_expr {
137+
const struct nft_expr_ops *ops;
138+
unsigned char data[];
139+
};
140+
141+
static inline void *nft_expr_priv(const struct nft_expr *expr)
142+
{
143+
return (void *)expr->data;
144+
}
145+
146+
/**
147+
* struct nft_rule - nf_tables rule
148+
*
149+
* @list: used internally
150+
* @rcu_head: used internally for rcu
151+
* @handle: rule handle
152+
* @dlen: length of expression data
153+
* @data: expression data
154+
*/
155+
struct nft_rule {
156+
struct list_head list;
157+
struct rcu_head rcu_head;
158+
u64 handle:48,
159+
dlen:16;
160+
unsigned char data[]
161+
__attribute__((aligned(__alignof__(struct nft_expr))));
162+
};
163+
164+
static inline struct nft_expr *nft_expr_first(const struct nft_rule *rule)
165+
{
166+
return (struct nft_expr *)&rule->data[0];
167+
}
168+
169+
static inline struct nft_expr *nft_expr_next(const struct nft_expr *expr)
170+
{
171+
return ((void *)expr) + expr->ops->size;
172+
}
173+
174+
static inline struct nft_expr *nft_expr_last(const struct nft_rule *rule)
175+
{
176+
return (struct nft_expr *)&rule->data[rule->dlen];
177+
}
178+
179+
/*
180+
* The last pointer isn't really necessary, but the compiler isn't able to
181+
* determine that the result of nft_expr_last() is always the same since it
182+
* can't assume that the dlen value wasn't changed within calls in the loop.
183+
*/
184+
#define nft_rule_for_each_expr(expr, last, rule) \
185+
for ((expr) = nft_expr_first(rule), (last) = nft_expr_last(rule); \
186+
(expr) != (last); \
187+
(expr) = nft_expr_next(expr))
188+
189+
enum nft_chain_flags {
190+
NFT_BASE_CHAIN = 0x1,
191+
NFT_CHAIN_BUILTIN = 0x2,
192+
};
193+
194+
/**
195+
* struct nft_chain - nf_tables chain
196+
*
197+
* @rules: list of rules in the chain
198+
* @list: used internally
199+
* @rcu_head: used internally
200+
* @handle: chain handle
201+
* @flags: bitmask of enum nft_chain_flags
202+
* @use: number of jump references to this chain
203+
* @level: length of longest path to this chain
204+
* @name: name of the chain
205+
*/
206+
struct nft_chain {
207+
struct list_head rules;
208+
struct list_head list;
209+
struct rcu_head rcu_head;
210+
u64 handle;
211+
u8 flags;
212+
u16 use;
213+
u16 level;
214+
char name[NFT_CHAIN_MAXNAMELEN];
215+
};
216+
217+
/**
218+
* struct nft_base_chain - nf_tables base chain
219+
*
220+
* @ops: netfilter hook ops
221+
* @chain: the chain
222+
*/
223+
struct nft_base_chain {
224+
struct nf_hook_ops ops;
225+
struct nft_chain chain;
226+
};
227+
228+
static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chain)
229+
{
230+
return container_of(chain, struct nft_base_chain, chain);
231+
}
232+
233+
extern unsigned int nft_do_chain(const struct nf_hook_ops *ops,
234+
struct sk_buff *skb,
235+
const struct net_device *in,
236+
const struct net_device *out,
237+
int (*okfn)(struct sk_buff *));
238+
239+
enum nft_table_flags {
240+
NFT_TABLE_BUILTIN = 0x1,
241+
};
242+
243+
/**
244+
* struct nft_table - nf_tables table
245+
*
246+
* @list: used internally
247+
* @chains: chains in the table
248+
* @sets: sets in the table
249+
* @hgenerator: handle generator state
250+
* @use: number of chain references to this table
251+
* @flags: table flag (see enum nft_table_flags)
252+
* @name: name of the table
253+
*/
254+
struct nft_table {
255+
struct list_head list;
256+
struct list_head chains;
257+
struct list_head sets;
258+
u64 hgenerator;
259+
u32 use;
260+
u16 flags;
261+
char name[];
262+
};
263+
264+
/**
265+
* struct nft_af_info - nf_tables address family info
266+
*
267+
* @list: used internally
268+
* @family: address family
269+
* @nhooks: number of hooks in this family
270+
* @owner: module owner
271+
* @tables: used internally
272+
* @hooks: hookfn overrides for packet validation
273+
*/
274+
struct nft_af_info {
275+
struct list_head list;
276+
int family;
277+
unsigned int nhooks;
278+
struct module *owner;
279+
struct list_head tables;
280+
nf_hookfn *hooks[NF_MAX_HOOKS];
281+
};
282+
283+
extern int nft_register_afinfo(struct nft_af_info *);
284+
extern void nft_unregister_afinfo(struct nft_af_info *);
285+
286+
extern int nft_register_table(struct nft_table *, int family);
287+
extern void nft_unregister_table(struct nft_table *, int family);
288+
289+
extern int nft_register_expr(struct nft_expr_ops *);
290+
extern void nft_unregister_expr(struct nft_expr_ops *);
291+
292+
#define MODULE_ALIAS_NFT_FAMILY(family) \
293+
MODULE_ALIAS("nft-afinfo-" __stringify(family))
294+
295+
#define MODULE_ALIAS_NFT_TABLE(family, name) \
296+
MODULE_ALIAS("nft-table-" __stringify(family) "-" name)
297+
298+
#define MODULE_ALIAS_NFT_EXPR(name) \
299+
MODULE_ALIAS("nft-expr-" name)
300+
301+
#endif /* _NET_NF_TABLES_H */
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#ifndef _NET_NF_TABLES_CORE_H
2+
#define _NET_NF_TABLES_CORE_H
3+
4+
extern int nf_tables_core_module_init(void);
5+
extern void nf_tables_core_module_exit(void);
6+
7+
extern int nft_immediate_module_init(void);
8+
extern void nft_immediate_module_exit(void);
9+
10+
extern int nft_cmp_module_init(void);
11+
extern void nft_cmp_module_exit(void);
12+
13+
extern int nft_lookup_module_init(void);
14+
extern void nft_lookup_module_exit(void);
15+
16+
extern int nft_bitwise_module_init(void);
17+
extern void nft_bitwise_module_exit(void);
18+
19+
extern int nft_byteorder_module_init(void);
20+
extern void nft_byteorder_module_exit(void);
21+
22+
extern int nft_payload_module_init(void);
23+
extern void nft_payload_module_exit(void);
24+
25+
#endif /* _NET_NF_TABLES_CORE_H */

include/uapi/linux/netfilter/Kbuild

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ header-y += nf_conntrack_ftp.h
55
header-y += nf_conntrack_sctp.h
66
header-y += nf_conntrack_tcp.h
77
header-y += nf_conntrack_tuple_common.h
8+
header-y += nf_tables.h
89
header-y += nf_nat.h
910
header-y += nfnetlink.h
1011
header-y += nfnetlink_acct.h

include/uapi/linux/netfilter/nf_conntrack_common.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ enum ip_conntrack_info {
2525
IP_CT_NUMBER = IP_CT_IS_REPLY * 2 - 1
2626
};
2727

28+
#define NF_CT_STATE_INVALID_BIT (1 << 0)
29+
#define NF_CT_STATE_BIT(ctinfo) (1 << ((ctinfo) % IP_CT_IS_REPLY + 1))
30+
#define NF_CT_STATE_UNTRACKED_BIT (1 << (IP_CT_NUMBER + 1))
31+
2832
/* Bitset representing status of connection. */
2933
enum ip_conntrack_status {
3034
/* It's an expected connection: bit 0 set. This bit never changed */

0 commit comments

Comments
 (0)